LINKS

Custom Perl 5 scripts

Script - cds2aa.v2.pl

The Perl 5 script cds2aa.v2.pl is used to in-batch transfer CDS sequences to amino acid sequences.

Script - fatools.pl

The Perl 5 script fatools.pl is used to in-batch extract CDS sequences from a target sequence together with its gene list .tab file.

Script - genbank2tab.pl

The Perl 5 script genbank2tab.pl is used to transfer a standard .gbk annotation file to a gene list .tab file.

Sequence assembly

Fastp

Fastp is a tool designed to provide fast all-in-one preprocessing for FASTQ files. This tool is developed in C++ with multithreading supported to afford high performance.

SPAdes

SPAdes is an assembly toolkit containing various assembly pipelines. The current version of SPAdes works with Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads.

Trimmomatic

Trimmomatic is a fast, multithreaded command line tool that can be used to trim and crop Illumina FASTQ data as well as to remove adapters.

Sequence alignment and comparison

Clustal W

Clustal W is a general purpose multiple alignment program for DNA or proteins.

Mauve

Mauve is a system for constructing multiple genome alignments in the presence of large-scale evolutionary events such as rearrangement and inversion. Multiple genome alignments provide a basis for research into comparative genomics and the study of genome-wide evolutionary dynamics.

MUSCLE

MUSCLE is a multiple alignment program with accuracy and speed that can align hundreds of sequences in seconds.

MSA

MSA (Multiple Sequence Alignment) is generally the alignment of three or more biological sequences of similar length. From the output, homology can be inferred and the evolutionary relationships between the sequences studied.

PSA

PSA (Pairwise Sequence Alignment) is used to identify regions of similarity that may indicate functional, structural and/or evolutionary relationships between two biological sequences.

GC content calculation

GC Content Calculator

GC Content Calculator analyzes the DNA/RNA sequence, calculates the GC content and plots the GC content distribution using sliding window.

GC Content Plot Online

GC Content Plot Online is a tool to calculate the %G~C content. The advantage of this software is its high speed.

Bacterial species identification

ANI (Average Nucleotide Identity) Calculator

ANI Calculator is used to compare two prokaryotic genome sequences when classifying and identifying bacteria by calculating the ANI values of two prokaryotic genome sequences.

JSpeciesWS Online Service

JSpeciesWS is a quick and easy to use online service to measure the probability if two or more (draft) genomes belong to the same species by pairwise comparison of (i) their ANI values and/or (ii) correlation indexes of their Tetra-nucleotide signatures.

EzTaxon

EzTaxon is a web-based tool for the identification of prokaryotes based on 16S ribosomal RNA gene sequences, and contains sequences of type strains of prokaryotic species with validly published names.

SpeciesFinder

SpeciesFinder predicts prokaryotic species based on the 16S rRNA gene similarity with the known reference sequence.

Multilocus sequence typing (MLST)

pMLST

pMLST contains databases containing definitions for MLST of IncA/C, IncI1, IncHI1, IncHI2, IncF and IncN plasmids.

pubMLST

pubMLST are public databases for molecular typing and microbial genome diversity.

SRST2

SRST2 is designed to take Illumina sequence data, a MLST database and/or a database of gene sequences (e.g. resistance genes, virulence genes, etc) and report the presence of STs and/or reference genes.

Others

ACLAME

ACLAME is a database dedicated to the collection and classification of mobile genetic elements from various sources, comprising all known phage genomes, plasmids and transposons.

Galileo AMR

Galileo AMR allows browsing of the repository of antibiotic resistance genes and selected mobile genetic elements, annotation of resistance genes and associated mobile genetic elements in bacterial DNA sequences and contribution of new resistance genes and mobile genetic elements not yet in the database.

GyDB

GyDB is an open editable database about the evolutionary relationship of viruses, mobile genetic elements and the genomic repeats.

ImmeDB

ImmeDB is a database dedicated to the collection, classification, and annotation of mobile genetic elements from gut microbiome.

PLSDB

PLSDB contains bacterial plasmids retrieved from the NCBI nucleotide database. An overview of the collected plasmids can be used to filter the sequences by various parameters such as length, topology, and taxonomic information.