Introduction



The objective of the Human Genome Structural Variation Project is to characterize the extent of structural variation (deletions, insertions, inversions, and other rearrangements) at the sequence level (HGSV Working Group, 2007). This database describes results of an initial analysis of eight individuals. The analyzed samples have been previously studied as part of the International HapMap Project. Corresponding DNA samples and cell lines are available from the Coriell Institute for Medical Research.

This project has employed a clone-based method to systematically identify and sequence structural variants genome wide. The approach is essentially as described in Tuzun et al (2005). Briefly, from each sample ~1 million fragments 40kbp in size are cloned into fosmids and end-sequenced. The corresponding end-sequence pairs (ESPs) are then mapped against the human genome reference assembly (NCBI build35, UCSC hg17) and potential sites of variation are identified by clusters of ESPs that map too far apart, too close together, or in an inappropriate orientation. Sequence which is present in the sampled individuals but is not represented in the reference genome assembly is also identified. Additionally, single nucleotide variants and small insertion-deletion variants (<100 bp) can be identified from the sequences of the mapped ESPs.

This database contains the data reported in Kidd et al (2008) . The browser is a modified mirror of the UCSC genome browser. Most functionality, such as the ability to search for genes, create post script screen-shot images, and search using the table browser has been retained. A more complete description of the available tracks is given here. Files for bulk data download (such as clones placements and ESP alignments) are available here

A listing of the individual sample-level validated sites of structural variation is available here.