diCal-admix: Project Web Hosting - Open Source Software

diCal-admix

Matthias Steinrücken1,2,3, Jeffrey P. Spence4, John A. Kamm5, Emilia Wieczorek6, and Yun S. Song3,5,7

1Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
2Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA, USA
3Department of EECS, University of California, Berkeley, CA, USA
4Computational Biology Graduate Group, University of California, Berkeley, CA, USA
5Department of Statistics, University of California, Berkeley, CA, USA
6Department of Mathematics, University of California, Berkeley, CA, USA
7Chan Zuckerberg Biohub, San Francisco, CA, USA

Introgression map

Posterior probabilities of Neanderthal introgression at each position in the genome for each CEU and CHB+CHS individual from the 1000 genomes project (phase 1, hg19) and called tracts (threshold 0.42) for each individuals (bed-format).

Link to cloud storage of the data.

Description of the data

We built introgression maps for all CEU, CHB, and CHS individuals from the 1000 genomes (phase 1, hg19) data. The whole set of YRI individuals from that dataset was used as the 'Neandertal-free' reference. We used chromosome specific recombination rates that we obtained from the deCode homepage. The details of the analysis can be found in this manuscript.

Posterior Probabilities

The directory Posterior Probabilities contains two subdirectories April2015 and March2018. We first build introgression maps in April of 2015, and the corresponding data can be found in the respective subdirectory. We refined the analysis pipeline in March of 2018 and repeated the analysis. The corresponding data can be found in the subdirectory March2018. The analysis based on the refined pipeline shows a slight increase in the average level of Neanderthal introgression, but the results of the downstream analyses reported in the manuscript were not affected substantially.

An example filename is CEU_lax_chrX.tar.gz. The first component of that name is CEU, the population; there are also some files called CHBS which contain the maps for CHB and CHS. The second part is 'lax' and describes the mapability filter used for the Neandertal individual in that analysis. The filters are explained in this paper (method summary): 'lax' is the 50% filter, 'strict' the 95%. We found that in our case, the 'lax'-filter was more conservative when calling introgression than the 'strict'. The last part of the filename is the chromosome.

Once you extract a file, a subdirectory with the same name is created. In it you find a chr*.pos file and several files that have individual IDs as name. E.g. the file NA12154.18.1.filtered contains the introgression map for individual NA12154 on chromosome 18, the second haplotype (the first would have a zero). The file "individuals.txt" on the cloud storage contains the IDs for the different populations. Note that on the X chromosome, some of the individuals have two and others one haplotype.

The format is as follows: the pos-file and the *.filtered files contain one column of numbers, and the columns have the same length. A certain entry in the pos-file gives a position on the DNA-sequence (in bp), and the corresponding entry in the *.filtered file gives the probability that the genetic variation of the focal haplotype of the focal individual at this position is introgressed from Neandertal. We reported the introgression probability with a 500 bp resolution.

Called Tracts

In the directory Called Tracts, we provide called tracts for the results based on the refined pipeline (subdirectory March2018). The regions of Neanderthal introgression for each individual haplotype are provided in the BED-file format. To obtain these regions, we applied a threshold of 0.42 to the posterior probabilities and reported the regions that exceeded this threshold (see manuscript for details). The naming scheme of the files follows the same convention as for the posterior probabilities.

Please contact the developers if you have additional questions about the analysis. If you use this data in further analysis, please refer to the manuscript or this website.

Users

Download diCal-admix jar-file (available) and sourcecode (not available yet)

Project info, screenshots, and more

Get support

Not what you're looking for?

SourceForge.net hosts over 100,000 Open Source projects. You may find what you're looking for by searching our site directory .

Project Information

About this project:

This is the diCal-admix project ("dical-admix")

This project is hosted by SourceForge.net. The project team describes it as:

Software for Admixture tract detection based on the diCal model (jar-file available, sourcecode will be added soon). On this project website you also find posterior probabilities of Neanderthal introgression at each position in the genome for the CEU and CHB+CHS individuals from the 1000 genome project.