Scientists Built the Highest Quality Plant Genome Reference Sequence

Publish Date:2017-05-09 23:26:20Visit:321

With the development of PacBio single molecule real-time(SMRT) sequencing technology, it has been able to independently assemble high-quality genome sketches. However, there are still a number of errors in these sketch sequences, such as sequences that contain many chimeras (ie, sequences are connected together at different locations) or areas where assembly qualityis poor, especially some repeat region not assemble, and some assembled out of multiple sequences. Moreover, these errors are often difficult to detect. In addition, the use of SMRT sequencing can only be used to put out a small DNA fragments. If the whole sequence of chromosomes is wanted, it needs to rely ongenetic map, or the Hi-C technology which develop recently. However, there are still many defects in the simple use of these two techniques to form the chromosome sequence: (1) the small fragments are difficult to put on the chromosome, resulting in a large number of gap in the chromosomes; (2) the fragments which has been put on contain a large number of sequential errors and sequence orientation errors. Using these chromosomes as a reference genome forgene mapping would lead to the missing of genes and the erroneous localization.

 

In order to use the available technology for high-quality plant genome assembly, the Zhicheng LIANG research group from Chinese Academy of Institute of Genetics and Developmental Biology, Chinese Academy of Sciences cooperate with the Professor Shigui LI from Sichuan Agricultural Universityhave carried out PacBio single molecule sequence on an indica genomeShuhui498(R498) since 2014. Combined with genetic map and fosmid library sequencing, and the use of BioNano optical map validation, they finally get a genome whose length is 390.3 Mb and consists of 17 consecutive DNA fragments(Super-Contig). It includes 7 nosetotail chromosomes and 5 chromosomes which can divide into two Super-Contigs.

 

The genome of Shuhui 498 is the genome with the high estquality in all higher flora and fauna. In addition to the five centromeric repeat regions and several other tandem repeats, the entire genome is assembled. The integrity and continuity of its genome is much higher than that of Nipponbare genome and Arabidopsis genome. And it has a lower error rate. This result also shows that the genome size of indica does not exceed 395 Mb. They found two nucleolus organizing regions in the R498 sequence, while there is only one in the Nipponbare genome. By comparing the gene sequences on the two genomes, it can be seen that more than two thirds of the genes have differences in genome. And there are large amounts of structural variation in the two genomes due to the independent insertion of the transposon.

 

In addition, they have also assembled a complete mitochondrial sequence and found several major errors in the indica mitochondrial sequence. The researchers have also found that the current indica genome reference sequence incorrectly mixes many sequences of mitochondria and chloroplasts. As a reference genome, the R498 sequence will be used for the mapping of indica mutant genes and genome-wide association study of indica populations. The completion of the Shuhui 498 genome shows that it is feasible to obtain a high quality reference genome with a continuous sequence at the chromosome level under the available technical conditions. It has a guiding significance to improve the quality of the genome assembly of the higher flora and fauna.

The research result has been published on Nature Communications on May 4th,2017.


Time:2017-05-05

Source: Institute of Genetics and Developmental Biology



previous:The Choice of Legislation Mode for Biological Genetic Resources in China
next:The Scientists Made Progress on Fate Regulation Research of Germline Stem Cell