Scientists Proposed a New Method on Reconstructing Encoding Genes

Publish Date:2016-12-13 23:42:39Visit:255

The research team from Beijing Institutes of Life Science of Chinese Academy of Sciences put forward a novel codon-based de Bruijn graph algorithm, which can directly identify and reconstruct the encoding genes of the transcriptome sequencing dates instead of using splicing strategy. It solves the identification problem of low efficiency and incompleteness for encoding genes. This method would have considerable promise for the genomic and evolutionary study of non-model organisms. The research result has been published on Genome Biology.

 

In recently years, the rapid expansion of high performance computing and high-throughput sequencing technologies have promoted the successful implementation of genome sequencing projects which has obtained vast amounts of biological omics data. Facing these transcriptome data, scientists should acquire their coding genetic information first. Traditionally, the gene identification tool mainly depends on the transcript acquired by assembled software RNA-seq. In fact, the assembled software is highly sensitive to sequencing errors and cannot handle repeats area efficiently. So a large number of redundant and fragmented gene sequences would be produced by gene identification in this way. Moreover, this tool has to overly depend on homologous gene database or reference genome and cannot be effectively applied in the gene identification of non-model organisms’ transcriptome data. Therefore, a new algorithm to reconstruct encoding gene based on the transcriptome data needs to be developed.

 

In order to solve the encoding gene identification problem in transcriptome data analyzation, the researchers developed a new algorithm inGAP-CDG based on codon de Bruijn graph. Instead of relying on the reference genes, this method directly carries out the gene identification from non-spliced transcriptome sequencing data. They carried out the systematic evaluation to the predicted genes from the aspect of length, sensitivity, redundancy, error rate and heterozygosity by using the real transcriptome sequencing data. Compared with the other methods, coding sequences constructed by inGAP-CDC have a longer length, a lower redundancy and a higher specificity. This research provided a new method for the gene identification and has an important application value for the research on phylogeny and functional genomics.

 


Time: 2016-11-30

Source: China Science Daily


previous:Chinese Leading Research Results Has Been Published on Nature,1,445 New Viruses Has Been Found
next:Scientists from SIBS Found the Potential Gene Therapy for Cancer Cachexia