In recent years, the introduction of new DNA-sequencing technologies has heralded a revolution in genetics. These new technologies have resulted in a precipitous drop in the cost of genome sequencing; a whole genome can now be sequenced for a few thousand dollars.
As costs continue to fall, genome sequencing will become increasingly commonplace, enabling the research and medical communities to better understand the causes of genetic disease. Ultimately, doctors and researchers will be able to turn to genome sequencing for guidance regarding disease diagnosis and treatment decisions.
This week’s edition of Nature highlights a paper from a large-scale genomics project that represents the collaborative efforts of nearly 400 scientists working at over 100 universities, research institutions and companies around the world. As one of those research scientists involved with the “1000 Genomes Project,” I’d like to share some of the project’s findings and highlight the implications for the research and medical communities.
The human genome contains approximately three billion bases (or letters) of DNA sequence and provides the entire set of genetic instructions for all of the cells in the body. While most bases in the human genome are the same for everybody, there are also millions of locations where differences exist among individuals. The vast majority of these DNA differences (or variants) have little or no biological impact, as they are located far away from genes. However, a small number do have an effect on the function or expression of a gene, and are therefore responsible for the inherited differences that we observe between people, such as hair or eye color.
Most of the variants found in the human genome are quite rare, perhaps occurring in less than one percent of the population. In recent years, these “rare variants” have become of particular interest, as it is increasingly believed that certain combinations of rare variants can be important in determining the risk of getting common diseases such as hypertension and diabetes. In order to identify as many rare variants as possible, it is necessary to sequence the genomes of large numbers of individuals. With this aim, the 1000 Genomes Project has done so for 1,092 people from around the world.
By sequencing the genomes of so many individuals, we are learning what variation we would expect to see in a typical genome. For example, we now know that a typical genome contains about 3.7 million variants. One of the more surprising findings is that the genome of a typical person contains about 150 variants that result in dysfunctional versions of genes. This is particularly interesting because the individuals studied by the 1000 Genomes Project did not exhibit any obvious signs of ill health.
While it is impossible to measure the genetic variation of the world’s seven billion people, the 1000 Genomes Project represents an important step in understanding the stunning variety of genomes found in members of the human species. The project has resulted in a vast data set, with 39.4 million variants identified in these genomes, ranging from single base-pair changes in the genetic sequence to large deletions of sequence covering thousands of bases.
Sizable projects such as the 1000 Genomes Project have become possible in recent years only because of the introduction of new sequencing technologies and the increasing power of modern-day computers. However, generating such large amounts of data is still generally beyond the means of any single research lab. As such, the project represents a great example of scientific collaboration, with all of its data being freely available on the Internet. This open sharing of information allows a wide range of researchers to use the data, and thereby enables research that wouldn’t otherwise have been possible.