(Updated Jan 20, 2017 by RW Williams)
BXD Genotypes file status (January 2017): From September 2016 to January 2017, Robert Williams, Jesse Ingels, Lu Lu, and Danny Arends released new genotype files for many of the original BXD strains (BXD1 through BXD102), and for all of the new strains (BXD104 to BXD220). Genotypes were generated at about 74,000 SNPs using the GigaMUGA array developed by Drs. Fernando Pardo-Manuel de Villena (University of North Carolina) and Gary Churchill (The Jackson Laboratory). Genotypes were generated at GeneSeek (Neogen Inc) with financial support from the University of Tennessee Center for Integrative and Translational Genomics.
The new genotypes are now available in GeneNetwork as the 2017 Genotype file. All SNPs were mapped to the older July 2007 mm9 NCBI Build 37 assembly and to the newer Dec 2011, mm10, GRCm38 assembly. Of the 74,000 GigaMUGA and many other markers we have typed, only about 7300 are useful (informative) in defining recombination events in the current set of BXDs. These informative markers either define unique recombination patterns across the 198 BXD strains (including strains that are now extinct) or they define the proximal and distal ends of regions that do not contain any known recombinations in the BXD family.
The file does include markers for Chr Y or the mitochondrion, but these data are of poor quality and will be revised later in 2017..
As of Jan 2017 GeneNetwork uses mm10 coordinates for mapping functions. Older mm9 versions of GeneNetwork are available on the GN TimeMachine (see upper right side of Search page).
BXD Genotype: The state of a gene or DNA sequence, usually used to describe a contrast between two or more states, such as that between the normal state (wildtype) and a mutant state (mutation) or between the alleles inherited from two parents. All species that are included in GeneNetwork are diploid (derived from two parents) and have two copies of most genes (genes located on the X and Y chromosomes are exceptions). As a result the genotype of a particular diploid individual is actually a pair of genotypes, one from each parents. For example, the offspring of a mating between strain A and strain B will have one copy of the A genotype and one copy of the B genotype and therefore have an A/B genotype. In contrast, offspring of a mating between a female strain A and a male strain A will inherit only A genotypes and have an A/A genotype.
Genotypes can be measured or inferred in many different ways, even by visual inspection of animals (e.g. as Gregor Mendel did long before DNA was discovered). But now the typical method is to directly test DNA that has a well define chromosomal location that has been obtained from one or usually many cases using molecular tests that often rely on polymerase chain reaction steps and sequence analysis. Each case is genotyped at many chromosomal locations (loci, markers, or genes). The entire collection of genotypes (as many a 1 million for a single case) is also sometimes referred to as the cases genotype, but the word "genometype" might be more appropriate to highlight the fact that we are now dealing with a set of genotypes spanning the entire genome (all chromosomes) of the case.
For gene mapping purposes, genotypes are often translated from letter codes (A/A, A/B, and B/B) to simple numerical codes that are more suitable for computation. A/A might be represented by the value -1, A/B by the value 0, and B/B by the value +1. This recoding makes it easy to determine if there is a statistically significant correlation between genotypes across of a set of cases (for example, an F2 population or a Genetic Reference Panel) and a variable phenotype measured in the same population. A sufficiently high correlation between genotypes and phenotypes is referred to as a quantitative trait locus (QTL). If the correlation is almost perfect (r > 0.9) then correlation is usually referred to as a Mendelian locus. Despite the fact that we use the term "correlation" in the preceding sentences, the genotype is actually the cause of the phenotype. More precisely, variation in the genotypes of individuals in the sample population cause the variation in the phenotype. The statistical confidence of this assertion of causality is often estimated using LOD and LRS scores and permutation methods. If the LOD score is above 10, then we can be extremely confident that we have located a genetic cause of variation in the phenotype. While the location is defined usually with a precision ranging from 10 million to 100 thousand basepairs (the locus), the individual sequence variant that is responsible may be quite difficult to extract. Think of this in terms of police work: we may know the neighborhood where the suspect lives, we may have clues as to identity and habits, but we still may have a large list of suspects.
The BXD genotype file was initially upgraded in 2010-2011 using the new high density Affymetrix array (580,000 high quality SNPs) developed in the laboratories of Drs. Fernando Pardo-Manuel de Villena (University of North Carolina) and Gary Churchill (The Jackson Laboratory, see Yang H, Ding Y, Hutchins LN, Szatkiewicz J, Bell TA, Paigen BJ, Graber JH, Pardo-Manuel de Villena, F, Churchill GA (2009) A customized and verstatile high density genotyping array for the mouse. Nat Methods 6:663-666)
The BXD genotype file used from June 2005 through December 2016 exploits a set of approximatey 3796 markers typed across 88 extant and extinct BXD strains (BXD1 through BXD102). The mean interval between informative markers is about 0.7 Mb. This genotype file includes all markers, both SNPs and microsatellites, with unique strain distribution patterns (SDPs), as well as pairs of markers for those SDPs represented by two or more markers. In those situations where three or more markers had the same SDP, we retained only the most proximal and distal marker in the genotype file. This particular file has also been smoothed to eliminate genotypes that are likely to be erroneous. We have also conservatively imputed a small number of missing genotypes (usually over very short intervals). Smoothing genotypes is this way reduces the total number of SDPs and also lowers the rate of false discovery. However, this procedure also may eliminate some genuine SDPs.
The new smoothed BXD genotype data file (2017) can be downloaded from
GeneNetwork at the URL http://www.genenetwork.org/genotypes/BXD.geno.
Please Note: For a limited number of markers and strains, the genotypes of BXDs have been called heterozygous. This is usually done over comparatively short intervals in some of the newer strains that may not have been fully inbred when they were initially genotyped. Use of the genotype file above in external software packages such as R/qtl, requires careful treatment of this issue to prevent bias in empirical significance thresholds. It is recommended to treat these rare heterozygous loci as missing data and ensure that only the additive effects of B vs. D alleles are estimated by these packages. (note by Elissa Chesler, Dec 2010).
Source of Genotypes:
In collaboration with members of the CTC (Richard Mott, Jonathan Flint, and colleagues), we have helped genotype a total of 480 strains using a panel of 13,377 SNPs. These SNPs were combined with our previious microsatellite genotypes to produce the older "classic" consensus maps for the expanded set of BXD using the older mouse assemblies (Mouse Build 36 - UCSC mm8 and then mm9). (Files were updated from mm6 to mm8 in January 2007, and from mm9 to mm10 in January 2017).
A total of 198 strains have be genotyped as of Jan 2017 using the full set of SNPs, and about 7324 of these are informative. Informative in this sense simply means that the C57BL/6J and DBA/2J parental strains have different alleles. To reduce false positive errors when mapping using this ultra dense map, we have eliminated most single genotypes that generate double-recombinant haplotypes that are most commonly produced by typing errors ("smoothed" genotypes). For this reason, the genotypes used in the GeneNetwork differ from those downloaded directly from Richard Mott's web site at the Wellcome Trust, Oxford or from the Jackson Laboratory.
We have genotyped all available BXD strains from The Jackson Laboratory. BXD1 through BXD32 were produced by Benjamin Taylor starting in the late 1970s. BXD33 through BXD42 were produced by Taylor in the 1990s (Taylor et al., 1999). All BXD strains with numbers higher than BXD42 (BXD43 through BXD100) were generated by Lu Lu and Robert Williams at UTHSC, and by Jeremy Peirce and Lee Silver at Princeton University. We thank Guomin Zhou for generating the advanced intercross stock used to produce most of these advanced RI strains both at UTHSC and Princeton. There are approximately 48 of these advanced BXD strains, each of which archives approximately twice the recombinations present in a typical F2-derived recombinant inbred strain (Peirce et al. 2003).
Due to the very high density of markers, the mapping algorithm used to map BXD data sets has been modified and is a mixture of simple marker regression, linear interpolation, and standard Haley-Knott interval mapping. When two adjacent markers have identical SDPs, they will have identical linkage statistics, as will the entire interval between these two markers (assuming complete and error-free haplotype data for all strains). On a physical map the LRS and the additive effect values will therefore be constant over this physical interval. Between neighboring markers that have different SDPs and that are separated by 1 cM or more, we use a conventional interval mapping method (Haley-Knott) combined with a Haldane estimate of genetic distance. When the interval is less than 1 cM, we simply interpolate linearly between markers based on a physical scale between those markers. The result of this mixture mapping algorithm is a linkage map of a trait that has an unusal profile that is particular striking on a physical (Mb) scale, with many plateaus, abrupt linear transitions between plateaus, and a few regions with the standard graceful curves typical of interval maps.
Archival BXD Genotype file: Prior to July 2005, the marker genotypes used to map all BXD data sets consisted of a set of 779 markers described by Williams and colleagues (2001) that also included a small number of additional SNPs from Tim Wiltshire and Mathew Pletcher (GNF, La Jolla), new microsatellite markers generated by Grant Morahan and Jing Gu (Msw type markers), and a few CTC markers by Jing Gu. This old marker data set was made obsolete by the ultra high density Illumina SNP genotype data generated Spring, 2005.
The entire BXD genotype data set used for mapping traits can be downloaded at BXD.geno.
The majority of SNP genotypes were generated at GeneSeek using the GigaMUGA array, at UNC using the Affymetrix mouse genotyping array, and at Illumina with support from the Wellcome Trust. The selection of markers to included in the final file was carried out by Robert W. Williams and Danny Arends in December 2017.
Dietrich WF, Katz H, Lincoln SE (1992) A genetic map of the mouse suitable for typing in intraspecific crosses. Genetics 131:423-447.
Taylor BA, Wnek C, Kotlus BS, Roemer N, MacTaggart T, Phillips SJ (1999) Genotyping new BXD recombinant inbred mouse strains and comparison of BXD and consensus maps. Mamm Genome 10:335-348.
Williams RW, Gu J, Qi S, Lu L (2001) The genetic structure of recombinant inbred mice: High-resolution consensus maps for complex trait analysis. Genome Biology 2:RESEARCH0046
Wiltshire T, Pletcher MT, Batalov S, Barnes SW, Tarantino LM, Cooke MP, Wu H, Smylie K, Santrosyan A, Copeland NG, Jenkins NA, Kalush F, Mural RJ, Glynne RJ, Kay SA, Adams MD, Fletcher CF (2003) Genome-wide single-nucleotide polymorphism analysis defines haplotype patterns in mouse. Proc Natl Acad Sci USA 100:3380-3385.