ClustAGE

UPDATED:

Option to output graphics in png or pdf format

Fixed bug that may have rarely led to small bins being omitted

Performance upgrades

ClustAGE takes a set of nucleotide sequences of accessory genomic elements (AGEs) and clusters them to identify the minimum set of accessory genomic elements in the population as well as determine the distribution of each accessory genomic element among the genomes.

AGE files must be in fasta format with each file containing a set of AGE sequences from a single strain's genome. Example of an AGE sequence file.

Limit 2 - 15 AGE sequence files per analysis. If analysis of a larger number of AGE sequence files is desired, please download the perl scripts to run locally (Link). Please also download the standalone software if you'd like the option to use raw sequencing reads to verify AGE distributions.

OPTIONAL Ranks: Strains can be assigned a numeric rank. This can be a real number (for example, a cytotoxicity assay value), a relative number (for example, relative virulence), or a group assignment (for example, 0 for antibiotic-sensitive, 1 for antibiotic-resistant). Decimals, negative numbers, and scientific notation (i.e. 1E-14) are allowed. Please leave this field blank if no ranking information is available or given. If the rank is set as "R", the sequence will be considered 'reference' and sequences belonging to this genome will NOT be used as AGE representatives, but alignments of this genome against AGE representatives will be reported.

OPTIONAL Annotations: If you have annotation information for the AGE sequences available, you may include it here. Annotation files must be formatted as output by Spine (v0.2 or above) or AGEnt (v0.2 or above). For more information about formatting and an example, see here.

Accessory Genomic Element Sequences:

Tip: Files can be dragged and dropped (in most browsers)

Genome #01: Sequences (fasta) Genome ID:
[Optional] Rank: [Optional] Annotation file:

Genome #02: Sequences (fasta) Genome ID:
[Optional] Rank: [Optional] Annotation file:

Genome #03: Sequences (fasta) Genome ID:
[Optional] Rank: [Optional] Annotation file:

Genome #04: Sequences (fasta) Genome ID:
[Optional] Rank: [Optional] Annotation file:

Genome #05: Sequences (fasta) Genome ID:
[Optional] Rank: [Optional] Annotation file:

Genome #06: Sequences (fasta) Genome ID:
[Optional] Rank: [Optional] Annotation file:

Genome #07: Sequences (fasta) Genome ID:
[Optional] Rank: [Optional] Annotation file:

Genome #08: Sequences (fasta) Genome ID:
[Optional] Rank: [Optional] Annotation file:

Genome #09: Sequences (fasta) Genome ID:
[Optional] Rank: [Optional] Annotation file:

Genome #10: Sequences (fasta) Genome ID:
[Optional] Rank: [Optional] Annotation file:

Genome #11: Sequences (fasta) Genome ID:
[Optional] Rank: [Optional] Annotation file:

Genome #12: Sequences (fasta) Genome ID:
[Optional] Rank: [Optional] Annotation file:

Genome #13: Sequences (fasta) Genome ID:
[Optional] Rank: [Optional] Annotation file:

Genome #14: Sequences (fasta) Genome ID:
[Optional] Rank: [Optional] Annotation file:

Genome #15: Sequences (fasta) Genome ID:
[Optional] Rank: [Optional] Annotation file:

Options:

Maximum e-value cutoff for alignments to be clustered together.
Minimum nucletotide percent identity for alignments to be clustered together.
Minimum accessory genomic element size, in bases. Only AGEs this length or longer will be used as potential cluster representatives.
Minimum size of alignments against accessory elements to report. Too large and true alignments may be missed, too small and nonspecific alignments will be included. Default of 100 is usually a good compromise.
Plot subelement dividers: YesNo
Plot format: pngpdf

Please wait for sequences to upload after pushing the button. Depending on your connection and file sizes, this may take a few minutes

Email Egon with questions or bugs.