Sid Talks Testing – Plant DNA Barcoding: the State of the Science

12/14/15

As we at Alkemist Labs prepare to include Next Generation DNA Sequencing as one of the testing tools we offer, we’ve run across some useful articles. The article I’ve summarized below by Xiwen Li1, Yang Yang1, Robert J. Henry2, Maurizio Rossetto3, Yitao Wang1,∗and Shilin Chen4,∗ (Biol. Rev. (2015), 90, pp. 157–166) may be helpful to people in our industry interested in understanding this comparatively new technology.

DNA barcoding is currently a widely used and effective tool that enables rapid and accurate identification of plant species; however, none of the available loci work across all species (a locus (plural loci) is the specific location or position of a gene, DNA sequence, on a chromosome). Since single-locus DNA barcodes lack adequate variations in closely related taxa, which are a group of one or more populations of an organism or organisms seen by taxonomists to form a unit, recent barcoding studies have placed high emphasis on the use of whole-chloroplast genome (an organism’s complete set of DNA, including all of its genes) sequences that are now more readily available as a consequence of improving sequencing technologies.

A new approach is being advocated for DNA barcoding that, for selected groups of taxa, combines use of single-locus barcodes and super-barcodes for efficient plant identification. Specific-barcodes might enhance our ability to distinguish closely related plants at the species and even population levels.

There are an estimated 300,000 plant species in the world but relatively few of these can be identified based on traditional plant identification methods. Accurate classification and identification of this large number of species remains a significant challenge even for specialist taxonomists. The emergence of DNA barcoding has had a positive impact on biodiversity classification and identification. DNA barcoding is a technique for characterizing species of organisms using a short DNA sequence from a standard and agreed-upon position in the genome. Since it was first put forward and widely applied in animals, DNA barcoding has attracted much attention from taxonomists. DNA barcoding can also be used for a wide range of purposes: to support ownership or intellectual property rights; to reveal cryptic species; in forensics to link biological samples to crime scenes; to support food safety and authenticity of labeling by confirming identity and purity; and in ecological and environmental genomic studies.

Global DNA barcoding was initially regarded as a ‘big science’ program and even as the renaissance of taxonomy. However, the cytochrome c oxidase 1 (CO1) sequence, which has been developed as a universal barcode in animals, does not discriminate most plants because of a much slower mutation rate. Although many studies have searched for a universal plant barcode, none of the available loci work across all species. The Consortium for the Barcode of Life-plant Working Group (CBOL) recently recommended the two-locus combination of matK + rbcL as the best plant barcode with a discriminatory efficiency of only 72%. Taxonomists have suggested that a multi-locus method may be necessary to discriminate plant species. However, CBOL demonstrated that the use of multiple loci did not clearly improve the species-level discriminatory ability of these techniques.

Researchers have recently proposed the use of the whole-plastid (plastids are doubled membrane sac-like organelles, generally involved in either the manufacture or storage of food) sequence in plant identification. However this concept has not yet been universally accepted, one of the main concerns is the high sequencing cost and difficulties involved in obtaining complete plastid genome sequences in comparison to the use of single-locus barcodes. Some researchers have argued that the full plastid haplotype (A haplotype is a set of DNA variations, or polymorphisms, that tend to be inherited together) is not a good marker because it does not always track species boundaries. To date, it is still unclear whether plastid genomes can be regarded as a suitable barcode.

Traditional barcodes have been widely studied but still have significant limitations. Despite extensive efforts to identify a universal plant barcode comparable to CO1 in animals, the task has proved difficult due to the lack of adequate variation within single loci between species. Many researchers have suggested that a multi-locus method will be required to obtain adequate species discrimination. Various combinations have been proposed. While these combined barcodes exhibit higher species discrimination than single-locus approaches, these compared barcode combinations using the same large-scale taxonomic samples could identify not more than 70% of tested species. Thus, the combinations of candidate loci cannot eliminate the inherent deficiencies of current DNA barcoding of plants.

It has recently been pointed out that the complete chloroplast (cp) genome (a plastid containing chlorophyll and other pigments, occurring in plants and algae that carry out photosynthesis) contained as much variation as the CO1 locus in animals and may be used as a plant barcode. The cp-genome has been used as a versatile tool for phylogenetics (the development or evolution of a particular group of organisms). It can greatly increase resolution at lower taxonomic levels in plant phylogenetics, and was therefore proposed as a species-level DNA barcode. Compared with the nuclear genome, the cp-genome is small in size and has a higher interspecific and lower intraspecific divergence, which makes it more suitable as a genome-based barcode.

Although sequences from single or multiple chloroplast and nuclear genes have been useful for differentiating species, the cp-genome has been used efficiently to distinguish between closely related species, populations and individuals. There is now software that has been developed that is particularly appropriate for evaluating the issue of species boundaries using part or entire cp-genomes as a plant Super-barcode. The main challenges of super-barcoding are the establishment of a rich cp-genome database and the reduction of sequencing cost, as well as obtaining a higher quality and quantity of DNA. With the development of next-generation sequencing (NGS), the number of cp-genomes sequenced has increased rapidly.

Now, neither extraction methods nor sequencing capacity can any longer be considered as limiting factors for obtaining cp-genome data, as NGS can generate many individual super-barcodes. As sequencing technology and bioinformatics (the retrieval and analysis of biochemical and biological data using mathematics and computer science, as in the study of genomes) continue to improve rapidly, complete genome sequencing will become more popular and may eventually replace Sanger-based (first generation sequencing) DNA barcoding.

In summary, single-locus barcodes lack adequate variations while fully annotated super-barcodes currently can be costly and may be overly complicated for laboratories that lack the necessary experience. To resolve this current challenge, the concept of using ‘specific barcodes’ which involve a trade-off between single-locus barcodes and super-barcodes. A specific barcode is a fragment of DNA sequence that has a sufficiently high mutation rate to enable species identification within a given taxonomic group. This approach is simpler than obtaining super-barcodes for each sample, and many options are available to choose from for informative markers.

The initial goal of DNA barcodes was to find a universal locus for the identification of all plants. However, there is no such universal barcode locus for land plants, especially in the chloroplast. That is why the specific-barcode approach relies on the use of dedicated cp-regions for each target group of species. While markers used in single-locus DNA barcodes such as rbcL region can provide resolution at a higher taxonomic rank (e.g. family or genus), specific barcodes can assist species-level identifications.

Although the increased availability of published cp-genomes will facilitate the design of specific barcodes, current advances in NGS provide further opportunities for this approach. The ultimate goal of DNA barcoding is to distinguish species rather than find a universal marker. Specific barcodes for each plant group suitable for application in traditional laboratories may be defined based upon the analysis of whole-chloroplast data. Specific barcoding is expected to become more widely used, providing fast and accurate molecular identifications at the species and population levels.

Alkemist Labs will soon be offering NGS for DNA ‘specific barcoding’ and is looking forward to using the data collected to improve the current state of DNA barcoding for the identification of botanicals, primarily for the Dietary Supplement Industry.

Sidney Sudberg
CSO/President

Sydney Sudberg

Each month Sidney Sudberg, Alkemist Founder & CSO, helps to demystify the science behind testing by discussing a testing-related topic.

Categories: News & Events