Using Markov Chains for time series data

Question

Here is how my data frame is currently structured (first 6 rows). The data I used is available here.

ID  date        sps     time    pp  datetime            km
1   2012-06-19  MICRO   2:19    0   2012-06-19 02:19    80
2   2012-06-21  MUXX    23:23   1   2012-06-21 23:23    80
3   2012-07-15  MAMO    11:38   0   2012-07-15 11:38    80
4   2012-07-20  MICRO   22:19   0   2012-07-20 22:19    80
5   2012-07-29  MICRO   23:03   0   2012-07-29 23:03    80
8   2012-08-07  PRLO    2:04    0   2012-08-07 02:04    80

The columns stand for:

ID: identification number
date: date of observation
km: location
sps: species code
time: time of observation
pp: codes if the species (sps) observed is a predator (1) or prey (0)
datetime: conversion of date and time rows to as.POSIXct format

The question I am trying to answer:

Does the likelihood of a predator (pp = 1) being observed increase with the number of times prey (pp = 0) are observed (e.g. is prey followed by predator more likely than prey followed by prey, etc.) at each location (km)?

Background:

For each location (km) there is a unique row in my data with the time the image is taken and an identification of whether the photo is of a predator or prey.
There are many photos of predators and prey at each location.
For each location, observations of predators and prey are made in temporal sequence.

What I am trying to do:

For each location, exhaustively count the number of temporal pairs of observations: prey-prey, prey-predator, predator-prey and predator-predator.
For each location, shuffle (randomize) the observations of pred/prey (i.e. maintain the same total number of pred/prey as observed) and count the number of pairs of observations generated by the shuffle: prey-prey, prey-predator, predator-prey and predator-predator. Record. Calculate the difference between number of observations in step (1) and that found by each shuffle. Repeat 1000 times. This should give me a sense of how likely the original observation of prey-prey, prey-predator, predator-prey, and predator-predator paired sequences are given the observed proportion of pred/prey.

My question:

Assuming a Markov Chain model is the most appropriate way to answer my question, how would I be able to code this in R?

At this point, I believe the R package I should be using is markovchain, but I don't know how to translate steps 1 and 2 into R code.

thc · Accepted Answer

Here is some code:

library(dplyr)
library(markovchain)

Read in data and format time stamps

data <- read.table("~/Downloads/d1.txt", sep="	", header=T, stringsAsFactors=F)
data$datetime <- as.POSIXct(data$datetime)

Sort by time

data <- data[order(data$datetime, decreasing=F),]

For each location, create a sequence of pp

data <- data %>% group_by(km) %>% summarize(pp_chain=list(pp)) %>% as.data.frame 
pp_chains <- data$pp_chain; names(pp_chains) <- data$km

Fit the Markov model on all sequence chains

fit <- markovchainFit(pp_chains)

Estimate transition probabilities:

print(fit$estimate)
          0          1
0 0.9116832 0.08831677
1 0.5250852 0.47491476

This matrix says the probability of transition from 0 to 0 is 0.91; the probability of transition from 0 to 1 is 0.088; and so on.

Estimate steady states:

print(steadyStates(fit$estimate))
             0         1
[1,] 0.8560214 0.1439786

We can compare the transition probabilities with the steady state. If transition were just random, then transition would not depend on the previous state and they would just be equal to the steady state values.

Since that's not the case, it's clear that if you have a predator you're much more likely to have another predator and vice versa.

Using Markov Chains for time series data

Answers (1)

Related Questions