Blundering Ecologist
Blundering Ecologist

Reputation: 1315

Using Markov Chains for time series data

Here is how my data frame is currently structured (first 6 rows). The data I used is available here.

ID  date        sps     time    pp  datetime            km
1   2012-06-19  MICRO   2:19    0   2012-06-19 02:19    80
2   2012-06-21  MUXX    23:23   1   2012-06-21 23:23    80
3   2012-07-15  MAMO    11:38   0   2012-07-15 11:38    80
4   2012-07-20  MICRO   22:19   0   2012-07-20 22:19    80
5   2012-07-29  MICRO   23:03   0   2012-07-29 23:03    80
8   2012-08-07  PRLO    2:04    0   2012-08-07 02:04    80

The columns stand for:

The question I am trying to answer:

Does the likelihood of a predator (pp = 1) being observed increase with the number of times prey (pp = 0) are observed (e.g. is prey followed by predator more likely than prey followed by prey, etc.) at each location (km)?

Background:

What I am trying to do:

  1. For each location, exhaustively count the number of temporal pairs of observations: prey-prey, prey-predator, predator-prey and predator-predator.

  2. For each location, shuffle (randomize) the observations of pred/prey (i.e. maintain the same total number of pred/prey as observed) and count the number of pairs of observations generated by the shuffle: prey-prey, prey-predator, predator-prey and predator-predator. Record. Calculate the difference between number of observations in step (1) and that found by each shuffle. Repeat 1000 times. This should give me a sense of how likely the original observation of prey-prey, prey-predator, predator-prey, and predator-predator paired sequences are given the observed proportion of pred/prey.

My question:

Assuming a Markov Chain model is the most appropriate way to answer my question, how would I be able to code this in R?

At this point, I believe the R package I should be using is markovchain, but I don't know how to translate steps 1 and 2 into R code.

Upvotes: 5

Views: 2749

Answers (1)

thc
thc

Reputation: 9705

Here is some code:

library(dplyr)
library(markovchain)

Read in data and format time stamps

data <- read.table("~/Downloads/d1.txt", sep="\t", header=T, stringsAsFactors=F)
data$datetime <- as.POSIXct(data$datetime)

Sort by time

data <- data[order(data$datetime, decreasing=F),]

For each location, create a sequence of pp

data <- data %>% group_by(km) %>% summarize(pp_chain=list(pp)) %>% as.data.frame 
pp_chains <- data$pp_chain; names(pp_chains) <- data$km

Fit the Markov model on all sequence chains

fit <- markovchainFit(pp_chains)

Estimate transition probabilities:

print(fit$estimate)
          0          1
0 0.9116832 0.08831677
1 0.5250852 0.47491476

This matrix says the probability of transition from 0 to 0 is 0.91; the probability of transition from 0 to 1 is 0.088; and so on.

Estimate steady states:

print(steadyStates(fit$estimate))
             0         1
[1,] 0.8560214 0.1439786

We can compare the transition probabilities with the steady state. If transition were just random, then transition would not depend on the previous state and they would just be equal to the steady state values.

Since that's not the case, it's clear that if you have a predator you're much more likely to have another predator and vice versa.

Upvotes: 4

Related Questions