Wednesday 11 March 2015

The World's Largest Collaborative Project: "Human Genome Project"


The Human Genome Project (HGP) is an international scientific research project with the goal of determining the sequence of chemical base pairs which make up human DNA (, and of identifying and mapping all of the genes of the human genome from both a physical and functional standpoint. It remains the world's largest collaborative biological project. The project was proposed and funded by the US government; planning started in 1984, the project got underway in 1990, and was declared complete in 2003. A parallel project was conducted outside of government by the Celera Corporation, or Celera Genomics, which was formally launched in 1998. Most of the government-sponsored sequencing was performed in twenty universities and research centers in the United States, the United Kingdom, Japan, France, Germany, and China.
The Human Genome Project originally aimed to map the nucleotides contained in a human haploid reference genome (more than three billion). The "genome" of any given individual is unique; mapping "the human genome" involves sequencing multiple variations of each gene. The project did not study the entire DNA found in human cells; some heterochromatic areas (about 10% of the total genome) remain not sequenced.


History
Briefly, in May, 1985 Robert Sinsheimer organized a workshop to discuss sequencing the human genome, but for a number of reasons the NIH was uninterested in pursuing the proposal. The following March, the Santa Fe Workshop was organized by Charles De Lisi and David Smith of the Department of Energy's Office of Health and Environmental Research (OHER).] At the same time Renato Dulbecco proposed whole genome sequencing in an essay in Science. James Watson followed two months later with a workshop held at the Cold Spring Harbor Laboratory.
Dr. Alvin Trivelpiece sought and obtained the approval of DeLisi's proposal by Deputy Secretary William Flynn Martin. This chart was used in the spring of 1986 by Trivelpiece, then Director of the Office of Energy Research in the Department of Energy, to brief Martin and Under Secretary Joseph Salgado regarding his intention to reprogram $4 million to initiate the project with the approval of Secretary Herrington. This reprogramming was followed by a line item budget of $16 million in the Reagan Administration’s 1987 budget submission to Congress. It subsequently passed both Houses. The Project was planned for 15 years.
Candidate technologies were already being considered for the proposed undertaking at least as early as 1985.
In 1990, the two major funding agencies, DOE and NIH, developed a memorandum of understanding in order to coordinate plans and set the clock for the initiation of the Project to 1990. At that time, David Galas was Director of the renamed “Office of Biological and Environmental Research” in the U.S. Department of Energy’s Office of Science and James Watson headed the NIH Genome Program. In 1993, Aristides Patrinos succeeded Galas and Francis Collins succeeded James Watson, assuming the role of overall Project Head as Director of the U.S. National Institutes of Health (NIH) National Center for Human Genome Research (which would later become the National Human Genome Research Institute). A working draft of the genome was announced in 2000 and the papers describing it were published in February 2001. A more complete draft was published in 2003, and genome "finishing" work continued for more than a decade.

The Human Genome Project was declared complete in April 2003. An initial rough draft of the human genome was available in June 2000 and by February 2001 a working draft had been completed and published followed by the final sequencing mapping of the human genome on April 14, 2003. Although this was reported to be 99% of the human genome with 99.99% accuracy a major quality assessment of the human genome sequence was published on May 27, 2004 indicating over 92% of sampling exceeded 99.99% accuracy which is within the intended goal. Further analyses and papers on the HGP continue to occur.

Applications

The sequencing of the human genome holds benefits for many fields, from molecular medicine to human evolution. The Human Genome Project, through its sequencing of the DNA, can help us understand diseases including: genotyping of specific viruses to direct appropriate treatment; identification of oncogenes and mutations linked to different forms of cancer; the design of medication and more accurate prediction of their effects; advancement in forensic applied sciences; biofuels and other energy applications; agriculture, livestock breeding, bioprocessing; risk assessment; bioarcheology, anthropology, evolution. Another proposed benefit is the commercial development of genomics research related to DNA based products, a multibillion dollar industry.
The sequence of the DNA is stored in databases available to anyone on the Internet. The U.S. National Center for Biotechnology Information (and sister organizations in Europe and Japan) house the gene sequence in a database known as GenBank, along with sequences of known and hypothetical genes and proteins. Other organizations, such as the UCSC Genome Browser at the University of California, Santa Cruz, and Ensemble present additional data and annotation and powerful tools for visualizing and searching it. Computer programs have been developed to analyze the data, because the data itself is difficult to interpret without such programs. Generally speaking, advances in genome sequencing technology have followed Moore’s Law, a concept from computer science which states that integrated circuits can increase in complexity at an exponential rate. This means that the speeds at which whole genomes can be sequenced can increase at a similar rate, as was seen during the development of the above-mentioned Human Genome Project.

Genome donors

In the IHGSC international public-sector Human Genome Project (HGP), researchers collected blood (female) or sperm (male) samples from a large number of donors. Only a few of many collected samples were processed as DNA resources. Thus the donor identities were protected so neither donors nor scientists could know whose DNA was sequenced. DNA clones from many different libraries were used in the overall project, with most of those libraries being created by Dr. Pieter J. de Jong's lab. Much of the sequence (>70%) of the reference genome produced by the public HGP came from a single anonymous male donor from Buffalo, New York (code name RP11).
HGP scientists used white blood cells from the blood of two male and two female donors (randomly selected from 20 of each) – each donor yielding a separate DNA library. One of these libraries (RP11) was used considerably more than others, due to quality considerations. One minor technical issue is that male samples contain just over half as much DNA from the sex chromosomes (one X chromosome and one Y chromosome) compared to female samples (which contain two X chromosomes). The other 22 chromosomes (the autosomes) are the same for both sexes.
Although the main sequencing phase of the HGP has been completed, studies of DNA variation continue in the International HapMap Project, whose goal is to identify patterns of single-nucleotide polymorphism (SNP) groups (called haplotypes, or “haps”). The DNA samples for the HapMap came from a total of 270 individuals: Yoruba people in Ibadan, Nigeria; Japanese people in Tokyo; Han Chinese in Beijing; and the French Centre d’Etude du Polymorphisme Humain (CEPH) resource, which consisted of residents of the United States having ancestry from Western and Northern Europe.
In the Celera Genomics private-sector project, DNA from five different individuals were used for sequencing. The lead scientist of Celera Genomics at that time, Craig Venter, later acknowledged (in a public letter to the journal Science) that his DNA was one of 21 samples in the pool, five of which were selected for use.
In 2007, a team led by Jonathan Rothberg published James Watson's entire genome, unveiling the six-billion-nucleotide genome of a single individual for the first time.

Benefits

The work on interpretation of genome data is still in its initial stages. It is anticipated that detailed knowledge of the human genome will provide new avenues for advances in medicine and biotechnology. Clear practical results of the project emerged even before the work was finished. For example, a number of companies, such as Myriad Genetics, started offering easy ways to administer genetic tests that can show predisposition to a variety of illnesses, including breast cancer, hemostasis disorders, cystic fibrosis, liver diseases and many others. Also, the etiologies for cancers, Alzheimer's disease and other areas of clinical interest are considered likely to benefit from genome information and possibly may lead in the long term to significant advances in their management.
There are also many tangible benefits for biologists. For example, a researcher investigating a certain form of cancer may have narrowed down his/her search to a particular gene. By visiting the human genome database on the World Wide Web, this researcher can examine what other scientists have written about this gene, including (potentially) the three-dimensional structure of its product, its function(s), its evolutionary relationships to other human genes, or to genes in mice or yeast or fruit flies, possible detrimental mutations, interactions with other genes, body tissues in which this gene is activated, and diseases associated with this gene or other datatypes. Further, deeper understanding of the disease processes at the level of molecular biology may determine new therapeutic procedures. Given the established importance of DNA in molecular biology and its central role in determining the fundamental operation of cellular processes, it is likely that expanded knowledge in this area will facilitate medical advances in numerous areas of clinical interest that may not have been possible without them.

The project has inspired and paved the way for genomic work in other fields, such as agriculture. For example, by studying the genetic composition of Tritium aestivum, the world’s most commonly used bread wheat; great insight has been gained into the ways that domestication has impacted the evolution of the plant. Which loci are most susceptible to manipulation, and how does this play out in evolutionary terms? Genetic sequencing has allowed these questions to be addressed for the first time, as specific loci can be compared in wild and domesticated strains of the plant. This will allow for advances in genetic modification in the future which could yield healthier, more disease-resistant wheat crops.

THE PROCESS OF DECIPHERING THE HUMAN GENOME

1. Experimental Procedures
a) Since DNA varies from one individual to another with roughly 1 nucleotide per 500, when DNA is cut with restriction enzymes a polymorphic pattern of fragments is produced which can be employed in genetic mapping by finding RFLPs with similar traits (markers).
b) Pulsed-field gel electrophoresis (PFGE) enables separation of large DNA fragments up to 10 million bp (base pairs).
c) Polymerase chain reaction (PCR) enables a manifold amplification of a DNA sequence, providing working means for analyzing minute amounts of DNA.
d) Yeast artificial chromosome (YAC) enables cloning of large DNA segments up to 1 million bp.
e) Sequence-tagged site (STS), the common mapping language, is a short (100-1000 bp) DNA segment, unique in the genome, defined by a pair of PCR primers. Genomatron7 is an automated system that can screen hundreds of STSs in hours.
f) "Positional candidate" strategy is predicted to become the major technique for identifying disease genes. The approach is based on an efficient three-step process: i) localizing a disease gene to a chromosomal subregion (using the traditional linkage analysis); ii) searching databases for an attractive candidate gene within that subregion; and iii) testing the candidate gene for disease-causing mutations. It is believed that by the first quarter of 1995, it helped identify more than fifty disease genes.

2. The Undertaking of the Human Genome Project

Following the founding of the HGP in 1984, the effort to sequence the entire human genome began. It was advocated by several scientists, including Robert Sinsheimer (then chancellor of the University of California, Santa Cruz), Charles Delisi (DOE) and Renate Dulbecco (then president of the Salk Institute).
In September of 1986, a National Research Council committee was asked to determine whether the HGP should be advanced. In February of 1988, the committee recommended its implementation, with the NIH playing a central role. Two months later, another committee, appointed by the U.S. Congress Office of Technology Assessment, released a report supporting the recommendation of the National Research Council committee. That same year, Congress appropriated $17.3 million to the NIH and $11.8 million to the DOE for genome research. An NIH office, the Center for Human Genome Research, was created. It was later renamed the National Center for Human Genome Research (NCHGR).
In early 1990, the NIH and DOE, partners in managing the HGP, presented to Congress a five-year term program, coordinated by the joint Subcommittee on the Human Genome, with seven major goals:
a) To develop maps of human chromosomes.
b) To improve technology for DNA sequencing.
c) To map and sequence the DNA of selected model organisms (mouse, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces cervisiae, Escherichia coli) .
d) To collect, manage and distribute data (bioinformatics).
e) To study the legal, social and ethical issues involved and to develop policy options.
f) To develop and improve technology.
g) To facilitate the transfer of technology.
A number of bioinformatics databases have been created, such as the Genome Data Base, which specializes in human genetic maps, supported by the NIH and DOE at the John Hopkins University Welch Medical Library in Baltimore.8
A program for the ethical, legal, and social implications, ELSI, has been launched.
Progress for the first five-year period was right on schedule, especially genetic mapping and sequencing of model organisms, while sequencing techniques are being progressively improved. The results of the linkage map were published in the "Index Marker Catalog" of the NCHGR, and complete mapping with 10-15 cM (centimorgans) resolution was completed in 1993.