Introgression map
Posterior probabilities of Neanderthal introgression at each position in the genome for each CEU and CHB+CHS individual from the 1000 genomes project (phase 1, hg19) and called tracts (threshold 0.42) for each individuals (bed-format).
Link to cloud storage of the data.
Description of the data
We built introgression maps for all CEU, CHB, and CHS individuals from the 1000 genomes (phase 1, hg19) data. The whole set of YRI individuals from that dataset was used as the 'Neandertal-free' reference. We used chromosome specific recombination rates that we obtained from the deCode homepage. The details of the analysis can be found in this manuscript.
Posterior Probabilities
The directory
Posterior Probabilities contains two subdirectories
April2015 and
March2018. We first build introgression maps in April of 2015, and the corresponding data can be found in the respective subdirectory. We refined the analysis pipeline in March of 2018 and repeated the analysis. The corresponding data can be found in the subdirectory
March2018. The analysis based on the refined pipeline shows a slight increase in the average level of Neanderthal introgression, but the results of the downstream analyses reported in the
manuscript were not affected substantially.
An example filename is CEU_lax_chrX.tar.gz. The first component of that name is CEU, the population; there are also some files called CHBS which contain the maps for CHB and CHS. The second part is 'lax' and describes the mapability filter used for the
Neandertal individual in that analysis. The filters are explained in
this paper (method summary): 'lax' is the 50% filter, 'strict' the 95%. We found that in our case, the 'lax'-filter was more conservative when calling introgression than the 'strict'. The last part of the filename is the chromosome.
Once you extract a file, a subdirectory with the same name is created. In it you find a chr*.pos file and several files that have individual IDs as name. E.g. the file NA12154.18.1.filtered contains the introgression map for individual NA12154 on chromosome 18, the second haplotype (the first would have a zero). The file "individuals.txt" on the cloud storage contains the IDs for the different populations. Note that on the X chromosome, some of the individuals have two and others one haplotype.
The format is as follows: the pos-file and the *.filtered files contain one column of numbers, and the columns have the same length. A certain entry in the pos-file gives a position on the DNA-sequence (in bp), and the corresponding entry in the *.filtered file gives the probability that the genetic variation of the focal haplotype of the focal individual at this position is introgressed from Neandertal. We reported the introgression probability with a 500 bp resolution.
Called Tracts
In the directory
Called Tracts, we provide called tracts for the results based on the refined pipeline (subdirectory
March2018). The regions of Neanderthal introgression for each individual haplotype are provided in the
BED-file format. To obtain these regions, we applied a threshold of 0.42 to the posterior probabilities and reported the regions that exceeded this threshold (see
manuscript for details). The naming scheme of the files follows the same convention as for the posterior probabilities.
Please contact the
developers if you have additional questions about the analysis. If you use this data in further analysis, please refer to the
manuscript or this website.
Users
Download diCal-admix jar-file (available) and sourcecode (not available yet)
Project info, screenshots, and more
Get support
Not what you're looking for?
SourceForge.net hosts over 100,000 Open Source projects. You may find what you're looking for by
searching our site directory
.