Remarkable sequence signatures in archaeal genomes
AHMED FADIEL,1,2 STUART LITHWICK,3 GOPI GANJI,3 and STEPHEN W. SCHERER 1
1 The Center for Applied Genomics, Hospital for Sick Children, Toronto, Ontario M5G 1Z8, Canada
2 Author to whom correspondence should be addressed ([email protected])
3 Bioinformatics Supercomputing Centre, The Genomics and Genetics Biology Program, Hospital for Sick Children, Toronto, Ontario M5G 1Z8, Canada
Received October 15, 2002; accepted November 6, 2002; published online February 19, 2003
Complete archaeal genomes were probed for the presence of long (≥ 25 bp) oligonucleotide repeats (words). We detected the presence of many words distributed in tandem with narrow ranges of periodicity (i.e., spacer length between repeats). Similar words were not identified in genomes of non-archaeal species, namely Escherichia coli, Bacillus subtilis, Haemophilus influenzae, Mycoplasma genitalium and Mycoplasma pneumoniae. BLAST similarity searches against the GenBank nucleotide sequence database revealed that these words were archaeal species-specific, indicating that they are of a signature character. Sequence analysis and genome viewing tools showed these repeats to be restricted to non-coding regions. Thus, archaea appear to possess a non-coding genomic signature that is absent in bacterial species. The identification of a species-specific genomic signature would be of great value to archaeal genome mapping, evolutionary studies and analyses of genome complexity.
Archaea, bioinformatics, comparative genomics, genome signature, oligonucleotide frequencies