Information on the Variant Browser
There will eventually be four different variant types in the variant browser: SNPs, insertions and deletions (indels), copy number variants (CNVs), and inversions. At present it is only possible to search for SNPs and indels (EGW, August 2009).
- Searching for SNPs of the Function type "Mis/non-sense" does not work (try Tpmt)
- It is not yet possible to only show that SNPs that differ among the selected subset of strains
- Information on the strand used to call the SNP is ambiguous
- Function text reads either "synonymous" or "Silent". All should read synonymous
- Selecting InDel should reset Function settings
You can browse SNPs either by submitting a gene symbol or SNP Id (with higher priority) or by defining a viewing range (currently a maximum of 5000 SNPs or 50Mb).
Genotypes are from Celera Genomics, the Perlegen/NIEHS resequencing project, the Wellcome-CTC SNP Project, dbSNP, the Center for INtegratie and Translational Genomics (CITG) at the University of Tennessee Health Science Center, and the MPD.
In brief, the column headers are as follows:
[Blank]: Incremental, temporary ID of the SNPs found in the search.
ID: Either the official NCBI reference SNP (rs) number or the identifier given by the institution where the SNP was found. A large number of SNPs have multiple local IDs (and thus duplicate records). mCV records are from Celera, NES numbers are from Perlegen, and MRS numbers are from UTHSC (Memphis reference SNP). The MRS SNPS are a high quality subset of about 2.8 million SNPs generated by sequencing DBA/2J using SOLiD short sequence reads (about 25 x shotgun) performed by Williams and colleagues. These MRS SNPs were generated by Dr. Xusheng Wang and entered into the Variant Browser by Evan Williams (August 2009). The IDs are linked to small tables that will bring up additional information.
Chr: The chromosome on which the SNP is located.
Mb: The location of the SNP in megabases. Position data are currently set to the NCBI Mouse Genome Build 37.1 (UCSC mm9, July 2007). The link uses the sequence flanking the SNP (if available) to verify the location using the UCSC BLAT alignment to the UCSC Genome Browser.
Domain: If applicable, the region on a gene where the SNP is found.
Gap: The distance from one SNP to the next; SNPs with gaps of zero are duplicates and (should) contain the same data. Checking "non-redundant" will ensure that all SNPs are unique, but some allele data may be hidden.
Gene: The gene on which the SNP is found, if applicable. The link goes to the gene's information on NCBI.
Conservation: How conserved the SNP is across species (note: mammals only: Rat, Rabbit, Human, Chimp, Rhesus Monkey, Dog, Cow, Armadillo, Elephant, Tenrec, Opossum). A high conservation score means the SNP is highly conserved across species; nearly every one will have the same allele. A low score means that the allele is evenly distributed between both. Thsi score is downloaded from the Vertebrate Multiz Alignment Conservation scores from UCSC
in mid 2009, using the following configuration parameters and species:
Alleles: The Major/Minor alleles of the SNP. This is based on frequency, and currently counts imputed SNPs and known SNPs separately, so you may see t/T, if imputed t is the most common allele, and known T the least common allele. This is unusual, however; in most cases, it should be the same.
Source: The source for the SNP. The vast majority (99%) are from either Celera, Perlegen, or UTHSC. Many of the Perlegen SNPs are labeled "Perl Impute," which contain imputed SNP data from Jackson Laboratories, but are otherwise the same as the Perl/NIEHS SNPs.
129S1/SvImJ: The first mouse alphabetically in the list of 74 strains and their corresponding SNPs.
For a more detailed explanation of the symbols used, see the field descriptions for a similarly structured SNP browser from Jackson Labs.
The InDel data currently consists of data from the comparison of the genome of the C57BL/6J strain of mouse relative to the DBA/2J strain of mouse. C57BL/6J is considered the reference, and plus and minus symbols in the Size field indicate a loss or gain of sequence in other strains (currently DBA/2J). This data is from UTHSC SOLiD sequencing and was analyzed by Dr. Xusheng Wang. As this feature is further implemented, InDel data from the Sanger Institute and other sources will be included.
Copy Number Variant data are currently not available in the Variant Browser, but the data should be available within the next few months (EGW: August 2009).
Tranposon data are currently not available in the Variant Browser, but the data should be available within the next few months (EGW: August 2009).