Estimate decay of linkage disequilibrium with distance

It is well known that linkage disequilibrium (LD) decays with distance. Several functions have been proposed to estimate such decay. Among the most widely used are the Hill and Weir (1) formula for describing the decay of r² and a formula proposed by Abecasis (2) for describing the decay of D’.
I wrote R functions to estimate decay of LD according to both the formulas for a paper I recently published (3). Please, refer to the original publications for details. Here I just use a non-linear model to fit the data do the decay function.

Estimate decay of r² with distance

Input:
n: sample size (i.e. number of sampled alleles)
LD.data: estimates of LD as r2 between pair of markers
distance: the distance between pair of markers. Can be in bp, Mbp, cM.
(note that LD.data and distance must be in the same order and of the same length since they represent respectively the LD values and distance of any pair of markers considered)

Output:
HW.nonlinear: object obtained after fitting the non-linear model
new.rho: estimate of population recombination parameter (which is actually C/distance)
fpoints: vector of LD obtained fitting the linear model. It has a one-to-one correspondence with the distance vector, i.e. the first element of fpointsis the LD estimate for the distance in the first element of distance and so on.

Below you find the commands, including some sample data. Any feedback is appreciated!

distance=c(19,49,1981,991,104,131,158,167,30,1000,5000,100,150,11,20,33,1100,1300,1500,100,1400,900,300,100,2000,100,1900,500,600,700,3000,2500,400,792)
LD.data=c(0.6,0.47,0.018,0.8,0.7,0.09,0.09,0.05,0,0.001,0,0.6,0.4,0.2,0.5,0.4,0.1,0.08,0.07,0.5,0.06,0.11,0.3,0.6,0.1,0.9,0.1,0.3,0.29,0.31,0.02,0.01,0.4,0.5)
n=52
HW.st=c(C=0.1)
HW.nonlinear=nls(LD.data~((10+C*distance)/((2+C*distance)*(11+C*distance)))*(1+((3+C*distance)*(12+12*C*distance+(C*distance)^2))/(n*(2+C*distance)*(11+C*distance))),start=HW.st,control=nls.control(maxiter=100))
tt=summary(HW.nonlinear)
new.rho=tt$parameters[1]
fpoints=((10+new.rho*distance)/((2+new.rho*distance)*(11+new.rho*distance)))*(1+((3+new.rho*distance)*(12+12*new.rho*distance+(new.rho*distance)^2))/(n*(2+new.rho*distance)*(11+new.rho*distance)))

Estimate decay of D’ with distance
Input:

n: sample size

LD.data: estimates of LD as D’ between pair of markers

distance: the distance between pair of markers. It is very important that distance is expressed in base pairs (bp). (note that LD.data and distance must be in the same order and of the same length since they represent respectively the LD values and distance of any pair of markers considered)
Output:

LD.nonlinear: object obtained after fitting the non-linear model

fpoints: vector of LD obtained fitting the linear model. It has a one-to-one correspondence with the distance vector, i.e. the first element of fpoints is the LD estimate for the distance in the first element of distance and so on.
Below you find the commands, including some sample data. Any feedback is appreciated!
distance=c(19,49,1981,991,104,131,158,167,30,1000,5000,100,150,11,20,33,1100,1300,1500,100,1400,900,300,100,2000,100,1900,500,600,700,3000,2500,400,792)
LD.data=c(0.6,0.47,0.018,0.8,0.7,0.09,0.09,0.05,0,0.001,0,0.6,0.4,0.2,0.5,0.4,0.1,0.08,0.07,0.5,0.06,0.11,0.3,0.6,0.1,0.9,0.1,0.3,0.29,0.31,0.02,0.01,0.4,0.5)
n=52
LD.st=c(b0=12.9)
distance.mb=distance/1000000
LD.nonlinear=nls(LD.data~(1-distance.mb)^b0,start=LD.st,control=nls.control(minFactor=1/1000000000,maxiter=100,warnOnly=T))
summ=summary(LD.nonlinear)
param=summ$parameters
beta0=param["b0","Estimate"]
fpoints=(1-distance.mb)^beta0

Elizabeth Scholl presented me with a quite important question. How to get the half-decay distance, i.e. the distance at which LD is half of its maximum value.

Here you have an approximate solution. However, if you have many comparisons it should work well.
df=data.frame(distance,fpoints)
maxld=max(LD.data)
#You could elucubrate if it's better to use the maximum ESTIMATED value of LD
#In that case you just set: maxld=max(fpoints)
h.decay=maxld/2
half.decay.distance=df$distance[which.min(abs(df$fpoints-h.decay))]

Of course, you might decide to plot your data. In the example I plot the actual LD points as points, and the estimated LD points as a dotted line.
ld.df=data.frame(distance,fpoints)
ld.df=ld.df[order(ld.df$distance),]
plot(distance,LD.data,pch=19,cex=0.9)
lines(ld.df$distance,ld.df$fpoints,lty=3,lwd=1.2)

References:
(1) Hill WG, Weir BS (1988) Variances and covariances of squared linkage disequilibria in finite populations. Theor Popul Biol 33:54–78

(2) Abecasis GR et al  (2001) Extent and distribution of linkage disequilibrium in three genomic regions. Am J Hum Genet 68:191–197

(3) Marroni et al (2011) Nucleotide diversity and linkage disequilibrium in Populus nigra cinnamyl alcohol dehydrogenase (CAD4) gene. Tree Genetics & Genomes, DOI 10.1007/s11295-011-0391-5.
Special thanks:

To Vittorio Zamboni and Davide Scaglione for bringing archival research to the next level!

		
	
			
			
			Share this:
Facebook
X
Like Loading...


	Related
										

					
						
							R
							nls, original publications
						

						
							August 9, 2011
							59 Comments

59 comments

Hasan Alhaddad

Hi Fabio

Thanks for the nice post. I am working on estimating LD in cats and I used the decay model you used with little modification. Thanks a lot. I was wondering if I can take a look at the decay function of D`. I would also suggest selecting independent pairwise comparisons only in the model. Thanks again

Hasan Alhaddad
PhD student
UC Davis
halhaddad@ucdavis.edu

Fabio Marroni's Blog

Estimate decay of linkage disequilibrium with distance

Share this:

Related

59 comments

Leave a comment Cancel reply

Archives

Blogroll

Meta