ClustAGE takes a set of nucleotide sequences of accessory genomic elements (AGEs) and clusters them to identify the minimum set of accessory genomic elements in the population as well as determine the distribution of each accessory genomic element among the genomes.
AGE files must be in fasta format with each file containing a set of AGE sequences from a single strain's genome. Example of an AGE sequence file.
Limit 2 - 15 AGE sequence files per analysis. If analysis of a larger number of AGE sequence files is desired, please download the perl scripts to run locally (Link). Please also download the standalone software if you'd like the option to use raw sequencing reads to verify AGE distributions.
OPTIONAL Ranks: Strains can be assigned a numeric rank. This can be a real number (for example, a cytotoxicity assay value), a relative number (for example, relative virulence), or a group assignment (for example, 0 for antibiotic-sensitive, 1 for antibiotic-resistant). Decimals, negative numbers, and scientific notation (i.e. 1E-14) are allowed. Please leave this field blank if no ranking information is available or given. If the rank is set as "R", the sequence will be considered 'reference' and sequences belonging to this genome will NOT be used as AGE representatives, but alignments of this genome against AGE representatives will be reported.
OPTIONAL Annotations: If you have annotation information for the AGE sequences available, you may include it here. Annotation files must be formatted as output by Spine (v0.2 or above) or AGEnt (v0.2 or above). For more information about formatting and an example, see here.
Email Egon with questions or bugs.