iBLUP - imputation by best linear unbiased prediction

 

1 Introduction

iBLUP is a novel genotype imputation program with folowing merits:

1) using for both Next Generation Sequencing data and Microarray data;

2) tolerance to High Missing Rate and Rare Variants;

3) using Identity By Descent and Linkage Disequilibrium information simultaneously and comprehensively;

4) estimating the Imputation Accuracy simutaniously.

 

2 Program

iBLUP can be run on both Linux and Windows platforms. It is freely available and can be downloaded here.

2.1 Linux 64-bit platform

Step 1: Download iBLUP program and execute permissions to iBLUP.(Last update: 20 May, 2013)

Step 2: Unzip the Intel dynamic link library file and copy all the files( .so files) to the /usr/lib/

 

2.2 Windons 64-bit OR 32-bit platform

Step 1: install ActivePerl (http://www.activestate.com/activeperl) before using iBLUP.

Step 2: Download iBLUP program (Last update: 20 May, 2013)

Step 3: unzip the Dynamic link library file( 64-bit OR 32-bit) and put the file in the same folder with iBLUP.

 

3 Example data

Next generation sequencing data

Genotype data from DNA chips

 

4 Running iBLUP

4.1 Next generation sequencing data

perl iblup.pl [vcf filename] [sample number] [minimum genotyping number] [LD threshold ] [convergence threshold]

1) vcf filename, the genotype file created from SAMtools or GATK software (Please download the example data).

2) sample number, the number of individuals included in the vcf file.

3) minimum genotyping number, the number of individuals which genotype calling value (Q value) no less than 20 (we recommend 1/3 of the total sample number).

4) LD threshold, LD threshold of block (we recommend 0.1).

5) convergence thresholds, convergence thresholds for kinship iteration(we recommend 0.001)

 

Example for Linux 64-bit : perl iBLUP.pl example_next_generation_sequencing_data.vcf 72 20 0.1 0.001 ### where arguments are separated by space.

Example for Windons 64-bit and 32-bit : perl iBLUP_win.pl example_next_generation_sequencing_data.vcf 72 20 0.1 0.001 ### where arguments are separated by space.

 

4.2 Genotype data from DNA chips

perl iblup_chip.pl [genoype filename] [sample number] [LD threshold ][convergence threshold]

1) genoype filename, SNP are sorted by chromosomes and location on the chromosome. defined the value ¡°0¡± for homozygous reference genotype, ¡°1¡± for heterozygous genotype and ¡°2¡± for homozygous allele genotype£¬"?" for missing genotype¡£ (Please download the example data).

2) sample number, the number of individuals included in the genotype file.

3) LD threshold, LD threshold of block (we recommend 0.1).

4) convergence thresholds, convergence thresholds for kinship iteration(we recommend 0.001)

 

Example for Linux 64-bit : perl iBLUP_chip.pl example_chip_data.txt 50 0.1 0.001 # where arguments are separated by space.

Example for Windons 64-bit and 32-bit : perl iBLUP_chip_win.pl example_chip_data.txt 50 0.1 0.001 # where arguments are separated by space.

 

5 Output files

5.1 Next generation sequencing data
  1) "genotype_output.txt" is the imputed genotypes  file. The data for each genetic marker is given in a single line. Columns 1-4  represent the marker¡¯s chromosome, position and SNP genotypes. 
The other columns are the individual genotype values, which are the continuous variable between 0 and 2 ( the value ¡°0¡± for homozygous reference genotype, ¡°1¡± for heterozygous genotype and ¡°2¡± for homozygous allele genotype). 2) "information.txt" is an output file that shows the number of SNPs and the accuracy of imputation.
5.2 Genotype data from DNA chips
  1) "genotype_output.txt" is the imputed genotypes  file. The data for each genetic marker is given in a single line. Columns 1  represent the marker¡¯s name. 
The other columns are the individual genotype values, which are the continuous variable between 0 and 2 (the value ¡°0¡± for homozygous reference genotype, ¡°1¡± for heterozygous genotype and ¡°2¡± for homozygous allele genotype). 2) "information.txt" is an output file that shows the number of SNPs and the accuracy of imputation.

 

6 Contact us

We are available via phone and email:

Yumei Yang , 862134206147, yangyumei2818@sina.com

Yuchun Pan , PI, panyuchun1963@aliyun.com