2011 Josef C Uyeda & Stevan J Arnold (revised July 9, 2021)
Get your data into R by following the following steps.Save the entire Exercise 1.1 directory, including our data file: R_inland_snake_data.txt If needed, set your working directory by putting the path to the downloaded folder you created in the following statement. If you opened this R session and script from the Exercise directory, you can use relative paths and skip this step. Be sure and use the / not the  spacing convention!
library(dplyr)
setwd('./') #You may need to set your working directory to where you downloaded the files
Our dataset contains garter snake scale counts for mothers and their offspring. Scale counts are good for this sort of comparison because they do not vary with ontogeny. Why is this important? Why does this simplify estimating heritability from a parent-offspring regression?
thamnophis <- read.table('R_inland_snake_data.txt',header=TRUE)
thamnophis
Here, each family represents a gravid female snake that was caught and all of her offspring. The first entry for a given family is the mother’s trait data. The subsequent entries are her kids. We will use dplyr
to reorganize the dataset into two sets of columns for each family calculating the mother’s and the kids’ trait means. Male snakes tend to have higher vertebral counts than female snakes. This dataset has already had the effects of sex removed (which explains why some kids have fractional scale counts).
thamnophis <- group_by(thamnophis, family) # Use the variable `family` as a grouping variable
moms <- filter(thamnophis, row_number()==1) # Extract first row for each family
kids <- filter(thamnophis, row_number()!=1) %>% dplyr::summarize(., across(body:post, ~ mean(.x, na.rm=TRUE))) #Extract all but the first row for each family, and summarize by averaging all traits, from `body` to `post`.
families <- left_join(moms, kids, "family", suffix=c(".mom", ".kids")) #Re-join mom and kids
families
Now we can estimate heritabilities from a parent-offspring regression. Plot the mother and offspring body vertebrae and fit a linear regression. Then add a trendline from the linear regression fit.
plot(families$body.mom, families$body.kids, pch=19, col="blue", cex=1.5, xlab="Body (Dam)", ylab="Body (Offspring)")
lm1 <- lm(body.kids ~ body.mom, data=families)
abline(lm1, lwd=5, lty=2)
summary(lm1)
Call:
lm(formula = body.kids ~ body.mom, data = families)
Residuals:
Min 1Q Median 3Q Max
-7.7707 -1.5516 -0.0373 1.7237 8.0121
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 110.40290 10.30783 10.711 < 2e-16 ***
body.mom 0.33906 0.06204 5.465 1.9e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.698 on 149 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.167, Adjusted R-squared: 0.1614
F-statistic: 29.87 on 1 and 149 DF, p-value: 1.898e-07
h2 <- list() #heritability
Vp <- list() #phenotypic variance
Va <- list() #additive genetic variance
h2$body <- 2*unname(lm1$coefficients[2]) #Remember we only have 1 of the parents. Use the slope from the linear model
# by using `lm1$coefficients[2]`
Vp$body <- var(moms$body, na.rm=TRUE) # Could also include offspring, but non-independent samples
Va$body <- h2$body*Vp$body # Use equation for heritability
Now repeat the analysis for tail (and other variables).
fits <- list()
par(mfrow=c(2,3))
for(i in 3:8){ #iterate over each of the traits
trait <- colnames(thamnophis)[i]
fits[[trait]] <- lm(families[[i+6]] ~ families[[i]])
plot(families[[i]], families[[i+6]], pch=19, col="blue", cex=1.5, xlab="Dam", ylab="Offspring", main=trait)
abline(fits[[trait]], lty=2, lwd=5)
}
h2 <- sapply(fits, function(x) 2*unname(x$coefficients[2]))
h2
body tail mid ilab slab post
0.67811916 0.46110175 0.25110435 0.07257756 -0.08889161 0.35312629
How much confidence should we have in these estimates? One way to estimate uncertainty and confidence limits with minimal distributional assumptions is to conduct a bootstrap.
nreps <- 1000 # Number of bootstrap replicates
bootsample <- matrix(NA, nrow=nreps, ncol=6) # Create an empty matrix where we will store results
colnames(bootsample) <- colnames(thamnophis)[3:8] # Set the column names of the empty results matrix
for(i in 1:nreps){
samp <- sample(1:nrow(families), nrow(families), replace=TRUE) # Key step to a bootstrap: Resample families with replacement
.newfamilies <- families[samp,] #Use the resampled families to create a new dataset where different families are randomly reweighted in frequency
h2row <- list() #Stpre results for each trait
for(j in 3:8){ #iterate over each of the traits
trait <- colnames(thamnophis)[j]
h2row[[trait]] <- 2*unname(lm(.newfamilies[[j+6]] ~ .newfamilies[[j]])$coefficients[2])
}
bootsample[i,] <- do.call(c, h2row)
}
Now we can plot our resulting distributions:
par(mfrow=c(3,2))
for(i in 1:ncol(bootsample)){
hist(bootsample[,i], main=colnames(bootsample)[i], xlim=c(-0.5, 1.5), breaks=nreps/20, xlab="heritability")
abline(v=mean(bootsample[,i]), lty=2, lwd=3)
abline(v=0, lwd=2, col="red")
abline(v=1, lwd=2, col="red")
}
Here, we ignored variation in the sizes of families. How should that affect our analyses?