Tutorial 1 - Trait evolution on trees

Install and Load Required Packages

You only need to install packages once. Skip these lines if they’re already installed.

# install.packages("phytools")
# install.packages("geiger")
# install.packages("BAMMtools)

library(phytools)
library(geiger)
library(BAMMtools)

Tutorial Overview

In this tutorial, we will:

Explore the Brownian Motion (BM) model of evolution
Perform ancestral character reconstruction for continuous traits
Visualize trait disparity through time (DTT)
Fit and compare BM model variants (e.g., early burst, late burst)
Assess phylogenetic signal
Explore rate heterogeneity across and within lineages
- Compare single- vs. multi-rate BM models with predefined regimes
- Bonus: Use BAMM to detect rate shifts without a priori regimes

We’ll use both simulated data and an empirical dataset on vipers from Alencar et al. (2017).

1. Comparing Simulated Traits Under Different Evolutionary Rates

In this section, we will explore how changing the rate of evolution (σ²) affects trait distributions under a Brownian Motion model of evolution.

Simulating a Phylogenetic Tree

We’ll start by simulating a pure-birth phylogenetic tree with 100 species.

set.seed(123)
tree <- pbtree(n = 100)
plot(tree, cex = 0.5)
axisPhylo()

Simulating Traits Under Different Rates

Now we simulate trait evolution using three different evolutionary rates:

Low (σ² = 0.0001)
Intermediate (σ² = 0.1)
High (σ² = 1)

trait.1 <- fastBM(tree, sig2 = 0.0001)
trait.2 <- fastBM(tree, sig2 = 0.1)
trait.3 <- fastBM(tree, sig2 = 1)

Visualizing Trait Evolution

We’ll use a multi-panel layout to visualize how rate differences affect both the temporal pattern of trait evolution (phenograms) and trait value distributions (histograms).

par(mfrow = c(3,2), mar = c(4, 4, 2, 1))

#Trait with very low rate
phenogram(tree, trait.1, ftype = "off", lwd = 2,
          xlab = "Time", ylab = "Trait value",
          main = "BM with low rate (σ² = 0.0001)")
hist(trait.1, xlab = "Trait value")

#Trait with intermediate rate
phenogram(tree, trait.2, ftype = "off", lwd = 2,
          xlab = "Time", ylab = "Trait value",
          main = "BM with intermediate rate (σ² = 0.1)")
hist(trait.2, xlab = "Trait value")

#Trait with high rate
phenogram(tree, trait.3, ftype = "off", lwd = 2,
          xlab = "Time", ylab = "Trait value",
          main = "BM with high rate (σ² = 1)")
hist(trait.3, xlab = "Trait value")

# Reset plot layout
par(mfrow = c(1,1))

Q: What changes across the plots?

Let’s look at the variance of each trait set:

var(trait.1)
var(trait.2)
var(trait.3)

Although all simulations use the same phylogenetic tree, higher σ² results in greater trait variance. This occurs because traits accumulate evolutionary change more rapidly under higher rates.

2. Reconstructing Ancestral Character States

One common goal in evolutionary biology is to estimate how traits looked in the past. For continuous traits, the most widely used method assumes Brownian Motion (BM) evolution and estimates ancestral values for internal nodes of a phylogeny. In this section, we’ll use an empirical dataset: body size in vipers (family Viperidae), along with a time-calibrated phylogeny from Alencar et al. 2016, Alencar et al. 2017

#Load body size data for vipers
bs.vipers <- read.csv2("vipers_habitat_bodysize_mod.csv", row.names = 1)

#Take a look
head(bs.vipers)

#Number of species in the dataset
length(bs.vipers$species)

#Load and plot the phylogeny
vipers.tree <- read.tree("vipers_tree.tre")
plot(vipers.tree, cex = 0.1)
axisPhylo()

#An alternative tree layout
plot(vipers.tree, cex = 0.1, type = "fan")

#Prepare trait vector for analysis
bs.vipers.vec <- setNames(bs.vipers$body.size, bs.vipers$species)

#Visualize trait values across the tree using a phenogram
phenogram(vipers.tree, bs.vipers.vec, ftype = "off", lwd = 2,
          xlab = "Time", ylab = "Body size")

We can spot some very large-bodied vipers!

Performing Ancestral State Reconstruction

We’ll use a method based on maximum likelihood (ML), which estimates the internal node values that make the observed (tips) data most likely, under a Brownian Motion model. There are multiple functions available to do this. Here, we will use fastAnc(), which is efficient and widely used for continuous traits.

par(mfrow = c(1,1), mar = c(1, 1, 1, 1))

fit.vipers.bs <- fastAnc(vipers.tree, bs.vipers.vec, vars = TRUE, CI = TRUE)

#The following lines returns estimated ancestral states, variances, and confidence intervals (not shown here):
#fit.vipers.bs$ace
#fit.vipers.bs$var

#Let's visualize the reconstructed traits on the tree using contMap()
vipers.contMap <- contMap(vipers.tree, bs.vipers.vec, fsize = 0.3, lwd = 1, outline = FALSE)

dev.off()

keep in mind: We’re inferring past values based only on present-day species, so reconstructions can carry high uncertainty - especially closer to the root.

3. Disparity Through Time (DTT) Plots

In this section, we’ll use both simulated and empirical data to explore Disparity Through Time (DTT), a method for visualizing how trait variation accumulates among lineages over evolutionary time.

Simulate a tree and trait under Brownian Motion

set.seed(123)
tree <- pbtree(n = 200)
trait <- fastBM(tree)

Run and plot the DTT

dtt.bm <- dtt(tree, trait, nsim = 100, plot = TRUE)

Under Brownian Motion, most disparity tends to arise early in the tree, because subclades quickly diverge. This leads to a gradual “decline” in the relative disparity curve through time, which is what the DTT plot reflects under the null model.

Sampling error can strongly affect DTT plots

As discussed in class, sampling error refers to measurement uncertainty in trait values at the tips of the tree (e.g., species). Even if a trait evolves exactly under BM, the observed data may not reflect that due to:

Small sample sizes
Within-species variation
Methodological or measurement imprecision

Lets explore how adding sampling error changes DTT curves.

We’ll use a function from Liam Revell (more info here)

par(mfrow = c(1,3))

#Revell's function to simulate a trait under BM adding sampling error
sim.error <- function(tree, k = 0.1) { 
  x <- fastBM(tree)
  v <- k * setNames(rexp(n = Ntip(tree)), names(x))
  sampleFrom(x, v, n = rep(1, Ntip(tree)))  
}

#DTT with relatively low sampling error
trait.low.error <- sim.error(tree, k = 0.05)
DTT.low <- dtt(tree, trait.low.error, plot = TRUE)

#DTT with relatively moderate sampling error
trait.mod.error <- sim.error(tree, k = 0.5)
DTT.mod <- dtt(tree, trait.mod.error, plot = TRUE)

#DTT with relatively high sampling error
trait.high.error <- sim.error(tree, k = 1.5)
DTT.high <- dtt(tree, trait.high.error, plot = TRUE)

As sampling error increases, the DTT curves deviate more from expectations under BM. This demonstrates how noise can inflate apparent disparity, particularly near the present.

Now let’s apply DTT to the viper body size dataset:

dtt.vipers <- dtt(vipers.tree, bs.vipers.vec, nsim = 100, plot = TRUE)

Q: What does the viper DTT curve suggest about body size evolution? Look for: (1) How early versus late disparity accumulates; (2) Whether (and if) the curve deviates above or below the null BM expectation.

4. Do traits always evolve under constant-rate Brownian Motion?

Not necessarily! Many traits do not follow a constant rate of evolution. For instance, in classic “adaptive radiation” scenarios, trait disparity tends to accumulate rapidly early in the clade’s radiation and slowdown toward the present.

To capture these dynamics, we can fit alternative models where the rate of evolution (σ²) changes over time.

Early-Burst (EB): The rate of evolution decreases exponentially through time
Late-Burst (LB): The rate increases exponentially through time
BM: Constant rate of evolution through time

Simulate and visualize these scenarios

#Simulate a tree with 50 species
set.seed(123)
tree <- pbtree(n = 50)

#Rescale the tree under Early and Late Burst dynamics
par(mfrow = c(1, 3))

plot(tree, main = "Original tree")
tree.eb <- rescale(tree, model = "EB", a = 3)
plot(tree.eb, main = "Early Burst")
tree.lb <- rescale(tree, model = "EB", a = -3)
plot(tree.lb, main = "Late Burst")

#Simulate trait evolution on each tree
trait.bm <- fastBM(tree)       #BM: constant rate
trait.eb <- fastBM(tree.eb)    #EB: rapid early evolution
trait.lb <- fastBM(tree.lb)    #LB: acceleration over time

par(mfrow = c(1, 3))
phenogram(tree, trait.bm, main = "Brownian Motion")
phenogram(tree.eb, trait.eb, main = "Early Burst")
phenogram(tree.lb, trait.lb, main = "Late Burst")

Q: Can you note the differences in how trait values spread over time?

Fit and compare BM vs. EB using vipers dataset

fit.bm <- fitContinuous(vipers.tree, bs.vipers.vec, model = "BM")  #Constant-rate BM
fit.eb <- fitContinuous(vipers.tree, bs.vipers.vec, model = "EB")  #Early/Late Burst

Reminder: The EB model includes both early and late burst as special cases. We just need to look at the estimated “a” parameter: - a < 0 -> early burst - a > 0 -> late burst - a = 0 -> similar to BM

fits <- list(BM = fit.bm, EB = fit.eb) #Compare model fits using AIC

data.frame(
  model = c("BM", "EB"),
  logLik = sapply(fits, logLik),
  AIC = sapply(fits, AIC)
)

Q: Which model fits the viper body size data best? (Lower AIC values indicate better support).

#Bonus: check the "a" parameter from the EB model
fit.eb$opt$a

Q: Does it suggest an early or late burst? (If the estimate is close to 0, then the pattern is essentially BM).

Tip: Use “?fitContinuous” to see all available models (including OU, lambda, delta, white noise).

5. Exploring Phylogenetic Signal

Let’s now explore the concept of phylogenetic signal, the tendency for related species to resemble each other more than expected by chance Revell et al. 2008.

We’ll use two commonly used metrics to quantify phylogenetic signal: Blomberg’s K & Pagel’s λ (lambda).

Simulating a Tree and Trait

We’ll begin by simulating a phylogenetic tree and a continuous trait evolving under brownian motion.

set.seed(123)
tree <- pbtree(n = 50) #Simulate tree with 50 tips
trait <- fastBM(tree) #Simulate trait evolving under BM

Now we’ll use the phylosig() function from the phytools package to estimate both K and λ. Check ?phylosig if you’d like more detail on the arguments.

Blomberg’s K

simul.tree.K <- phylosig(tree, trait, method = "K", test = TRUE) 
simul.tree.K
plot(simul.tree.K, las = 1, cex.axis = 0.9)

Interpretation:

K ≈ 1: trait evolution matches Brownian motion expectations.
K > 1: closely related species are more similar than expected - strong phylogenetic signal.
K < 1: trait values are more “weakly” determined by shared ancestry - weak signal.

The test (test = TRUE) performs randomizations to see if the observed K is significantly different from a null model.

Pagel’s λ

simul.tree.lambda <- phylosig(tree, trait, method = "lambda", test = TRUE) 
simul.tree.lambda
plot(simul.tree.lambda, las = 1, cex.axis = 0.9) #Likelihood surface

Interpretation:

λ = 1: trait evolution is consistent with Brownian motion.
λ = 0: traits evolve “independently” of phylogeny (no signal).

Pagel’s λ uses maximum likelihood to estimate λ and tests whether it’s significantly different from 0 or 1. The plot is showing the likelihood surface for λ and it is good to visualize the best-fit estimate, in this case, close to 1.

Phylogenetic signal using empirical data (viper dataset)

We’ll use the body size data (bs.vipers.vec) and a phylogeny (vipers.tree) from Alencar et al. (2017), that you already imported and formatted above.

Blomberg’s K

vipers.K <- phylosig(vipers.tree, bs.vipers.vec, method = "K", test = TRUE) 
vipers.K
plot(vipers.K, las = 1, cex.axis = 0.9)

Pagel’s λ

vipers.lambda <- phylosig(vipers.tree, bs.vipers.vec, method = "lambda", test = TRUE) 
vipers.lambda
plot(vipers.lambda, las = 1, cex.axis = 0.9)

Q: 1. What do the results suggest about the strength of phylogenetic signal in body size for vipers? 2. Do K and λ lead to similar conclusions?

6. Exploring Rate Heterogeneity

Up to this point, we have assumed that all lineages evolve under the same rate of trait evolution. However, this is not always biologically realistic. Some clades may evolve faster or slower than others due to, for example, ecological opportunity, key innovations, constraints, or environmental pressures.

We will start by exploring rate heterogeneity, variation in the pace of evolution between lineages, using the ratebytree() function from the phytools package.

Simulate Trees with Different Trait Evolution Rates

We’ll simulate two datasets that share the same tree topology, but differ in the rate of trait evolution.

set.seed(123)

#Simulate two identical trees
tree1 <- pbtree(n = 50) 
tree2 <- tree1  #identical topology and branch lengths

#Simulate traits evolving under different Brownian motion rates
trait1 <- fastBM(tree1, sig2 = 0.01)  #slow evolution
trait2 <- fastBM(tree2, sig2 = 1.0)  #fast evolution

#Estimating and Comparing Evolutionary Rates
rate_result <- ratebytree(c(tree1, tree2), list(trait1, trait2))
rate_result

Interpretation:

The output summarizes the results of two models:

single/common-rate model (null hypothesis): Assumes the same evolutionary rate for both datasets (one σ² for both).
Multi-rate model: Allows each tree to have its own σ².

The output shows: Estimated evolutionary rates (s²1, s²2), Log-likelihoods for each model, a likelihood ratio test (LRT) and p-value testing whether rates differ significantly.

As we simulated a slow-evolving trait in tree1 and a fast-evolving trait in tree2, we expect a much higher rate (σ²) for tree2 than tree1, and a significant p-value indicating that the multi-rate model fits the data better.

Testing Rate Heterogeneity Within a Tree

So far, we’ve compared evolutionary rates across trees (i.e., between clades), but what if we want to explore rate heterogeneity within a single phylogeny?

There are two main approaches:

Specify regimes a priori based on biological expectations (e.g., habitat, geography, parity mode).
Use data-driven approaches that infer rate shifts without prior assumptions (we’ll explore this shortly with BAMM).

Let’s start with the first approach using the viper dataset and the function brownie.lite() from the phytools package.

Approach 1

Step 1. Define regimes using Stochastic Character Mapping

We’ll use a method called stochastic character mapping (Huelsenbeck et al. 2003) to reconstruct the likely ancestral states of habitat use across the viper phylogeny. These reconstructed states will define the regimes across which we want to test for body size rate differences.

set.seed(123)
hab.vipers.vec <- setNames(bs.vipers$habitat, bs.vipers$species)

simmap.vipers <- make.simmap(vipers.tree, hab.vipers.vec, model = "ARD", 
Q = "empirical", nsim = 1) ##This will take a few seconds...

Why this setup? - model = “ARD”: “All Rates Different”; allows all transitions between habitat states to be freely estimated. - Q = “empirical”: uses the transition matrix inferred from observed species data to guide the mapping.

In practice, you would want to generate many stochastic maps (e.g., nsim = 100) to incorporate uncertainty in the ancestral reconstruction during downstream analyses.

plotSimmap(simmap.vipers, fsize = 0.1)

Step 2. Compare rates using brownie.lite()

Now we can use brownie.lite() to compare two models: - A single-rate Brownian motion model - A multi-rate model where each regime has its own evolutionary rate

Because this can be computationally slow (especially for large trees), we’ll load a precomputed result instead of running it live.

#brownie.vipers <- brownie.lite(simmap.vipers, bs.vipers.vec, test = “simulation”, nsim = 5) #This is how you would run it.

brownie.vipers <- readRDS(file = "brownieRes_vipers.Rdata")
brownie.vipers

Interpreting the output:

s^2(1) to s^2(4) are the estimated rates for each regime. Note that all rates are similar.
A p-value of 0.6 suggests the multi-rate model does not significantly improve the fit. Therefore, the data do not support rate heterogeneity across regimes.

Approach 2

Now, we will use a method called BAMM - Bayesian Analysis of Macroevolutionary Mixtures to detect rate shifts without a priori regimes (modified from Gustavo Burin’s original tutorial).

BAMM uses reversible-jump Markov Chain Monte Carlo (rjMCMC) to model complex dynamics of speciation, extinction, and trait evolution on phylogenies. In this tutorial, we will focus exclusively on estimating trait evolutionary rates using BAMM. The first step is to download the BAMM executable from the official BAMM website. Installation instructions are provided for each operating system (Windows, macOS, Linux) and should be followed carefully to ensure the program is properly configured.

It’s best practice to create a separate folder for each BAMM run or project. This keeps all input and output files organized in one place and reduces the #chance of errors or file mix-ups. To run a trait evolution analysis in BAMM, you need three key files:

BAMM executable file: the program itself
Control file: a plain-text configuration file that defines parameters and input paths
Tree file: a phylogenetic tree in Newick format (txt)

We’ll work with a phylogenetic tree file named “vipers_tree.txt” and a body size dataset named “body_size.txt”. The control file for the analysis will be named “traitcontrol.txt”.

In the folder you created for this analysis, make sure to place the three essential files: The BAMM executable file, the control file (traitcontrol.txt), and the tree file (vipers_tree.txt).

You’re now ready to edit the control file.

The Control File

The control file contains all the necessary settings to run BAMM, including the names of your input files and the configuration parameters for the analysis. It supports many different options, some of which are pre-set by default.

Each option is commented within the file to describe its function. You can find further explanations and examples for each parameter on the BAMM documentation website.

Carefully review and update the control file to match the names of your tree and trait files, and to specify the number of generations, sampling frequency, priors, and output file paths.

Running BAMM

Once your input files are ready, open your terminal and navigate to the folder containing the BAMM executable, the control file, and the phylogenetic tree.

To start the analysis, run the following command in the terminal:

./bamm -c traitcontrol.txt

Visualizing Results with BAMMtools (back in R)

Once BAMM has finished running, you can use the BAMMtools R package to explore and visualize the results.

tree <- read.tree("vipers_tree_BAMM.txt")
plot(tree, cex = 0.35)
axisPhylo()

After running your BAMM analysis, you’ll need to read the output data into R to explore and visualize the results.

First, read in the event data file generated by BAMM

events <- read.csv("event_data_tree1.txt")

Next, use the getEventData() function from the BAMMtools package to generate a bammdata object. This object contains the phylogeny and all the macroevolutionary sampled by BAMM. Many downstream BAMMtools functions operate directly on this object.

ed <- getEventData(tree, events, burnin = 0.25, type = "trait")
head(ed$eventData)

Now, you can visualize the estimated trait evolutionary rates across the tree:

bamm_plot <- plot.bammdata(ed, lwd = 2, labels = TRUE, cex = 0.15)
addBAMMlegend(bamm_plot, location = "topleft")

#This will produce a tree where branch colors represent evolutionary rates.

Tutorial 1 - Trait evolution on trees

Evolutionary Quantitative Genetics Workshop 2025

Laura Alencar