KOPRI Repository

CLUSTOM: A Novel Method for Clustering 16S rRNA Next Generation Sequences by Overlap Minimization

Cited 12 time in scopus
Metadata Downloads
Title
CLUSTOM: A Novel Method for Clustering 16S rRNA Next Generation Sequences by Overlap Minimization
Other Titles
CLUSTOM: 16S rRNA 클러스터링 프로그램
Authors
Hwang, Kyuin
Kim, Kyung Mo
Hong, Soon Gyu
Gustavo Caetano-Anolle´s
Hou, Bo Kyeng
Yu, Dong Su
Kim, Byung Kwon
Kim, Tae-Kyung
Oh, Jeongsu
Subject
Science & Technology - Other Topics
Keywords
16S rRNA; clustering; overlap minimization
Issue Date
2013
Publisher
www.plosone.org
Citation
Hwang, Kyuin, et al. 2013. "CLUSTOM: A Novel Method for Clustering 16S rRNA Next Generation Sequences by Overlap Minimization". PLOS ONE, 8(5): e62623-e62623.
Abstract
The recent nucleic acid sequencing revolution driven by shotgun and high-throughput technologies has led to a rapid increase in the number of sequences for microbial communities. The availability of 16S ribosomal RNA (rRNA) gene sequences from a multitude of natural environments now offers a unique opportunity to study microbial diversity and community structure. The large volume of sequencing data however makes it time consuming to assign individual sequences to phylotypes by searching them against public databases. Since ribosomal sequences have diverged across prokaryotic species, they can be grouped into clusters that represent operational taxonomic units. However, available clustering programs suffer from overlap of sequence spaces in adjacent clusters. In natural environments, gene sequences are homogenous within species but divergent between species. This evolutionary constraint results in an uneven distribution of genetic distances of genes in sequence space. To cluster 16S rRNA sequences more accurately, it is therefore essential to select core sequences that are located at the centers of the distributions represented by the genetic distance of sequences in taxonomic units. Based on this idea, we here describe a novel sequence clustering algorithm named CLUSTOM that minimizes the overlaps between adjacent clusters. The performance of this algorithm was evaluated in a comparnces from a multitude of natural environments now offers a unique opportunity to study microbial diversity and community structure. The large volume of sequencing data however makes it time consuming to assign individual sequences to phylotypes by searching them against public databases. Since ribosomal sequences have diverged across prokaryotic species, they can be grouped into clusters that represent operational taxonomic units. However, available clustering programs suffer from overlap of sequence spaces in adjacent clusters. In natural environments, gene sequences are homogenous within species but divergent between species. This evolutionary constraint results in an uneven distribution of genetic distances of genes in sequence space. To cluster 16S rRNA sequences more accurately, it is therefore essential to select core sequences that are located at the centers of the distributions represented by the genetic distance of sequences in taxonomic units. Based on this idea, we here describe a novel sequence clustering algorithm named CLUSTOM that minimizes the overlaps between adjacent clusters. The performance of this algorithm was evaluated in a compar
URI
http://repository.kopri.re.kr/handle/201206/6462
DOI
http://dx.doi.org/10.1371/journal.pone.0062623
Files in This Item
General Conditions
      ROMEO Green
    Can archive pre-print and post-print or publisher's version/PDF
      ROMEO Blue
    Can archive post-print (ie final draft post-refereeing) or publisher's version/PDF
      ROMEO Yellow
    Can archive pre-print (ie pre-refereeing)
      ROMEO White
    Archiving not formally supported

    qrcode

    Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

    Browse