Reputation: 1315
Here is how my data frame is currently structured (first 6 rows). The data I used is available here.
ID date sps time pp datetime km
1 2012-06-19 MICRO 2:19 0 2012-06-19 02:19 80
2 2012-06-21 MUXX 23:23 1 2012-06-21 23:23 80
3 2012-07-15 MAMO 11:38 0 2012-07-15 11:38 80
4 2012-07-20 MICRO 22:19 0 2012-07-20 22:19 80
5 2012-07-29 MICRO 23:03 0 2012-07-29 23:03 80
8 2012-08-07 PRLO 2:04 0 2012-08-07 02:04 80
The columns stand for:
ID
: identification numberdate
: date of observationkm
: locationsps
: species codetime
: time of observationpp
: codes if the species (sps
) observed is a predator (1) or prey (0)datetime
: conversion of date
and time
rows to as.POSIXct
formatThe question I am trying to answer:
Does the likelihood of a predator (
pp
= 1) being observed increase with the number of times prey (pp
= 0) are observed (e.g. is prey followed by predator more likely than prey followed by prey, etc.) at each location (km
)?
Background:
km
) there is a unique row in my data with the time the image is taken and an identification of whether the photo is of a predator or prey.What I am trying to do:
For each location, exhaustively count the number of temporal pairs of observations: prey-prey, prey-predator, predator-prey and predator-predator.
For each location, shuffle (randomize) the observations of pred/prey (i.e. maintain the same total number of pred/prey as observed) and count the number of pairs of observations generated by the shuffle: prey-prey, prey-predator, predator-prey and predator-predator. Record. Calculate the difference between number of observations in step (1) and that found by each shuffle. Repeat 1000 times. This should give me a sense of how likely the original observation of prey-prey, prey-predator, predator-prey, and predator-predator paired sequences are given the observed proportion of pred/prey.
My question:
Assuming a Markov Chain model is the most appropriate way to answer my question, how would I be able to code this in R?
At this point, I believe the R package I should be using is markovchain
, but I don't know how to translate steps 1 and 2 into R code.
Upvotes: 5
Views: 2749
Reputation: 9705
Here is some code:
library(dplyr)
library(markovchain)
Read in data and format time stamps
data <- read.table("~/Downloads/d1.txt", sep="\t", header=T, stringsAsFactors=F)
data$datetime <- as.POSIXct(data$datetime)
Sort by time
data <- data[order(data$datetime, decreasing=F),]
For each location, create a sequence of pp
data <- data %>% group_by(km) %>% summarize(pp_chain=list(pp)) %>% as.data.frame
pp_chains <- data$pp_chain; names(pp_chains) <- data$km
Fit the Markov model on all sequence chains
fit <- markovchainFit(pp_chains)
Estimate transition probabilities:
print(fit$estimate)
0 1
0 0.9116832 0.08831677
1 0.5250852 0.47491476
This matrix says the probability of transition from 0 to 0 is 0.91; the probability of transition from 0 to 1 is 0.088; and so on.
Estimate steady states:
print(steadyStates(fit$estimate))
0 1
[1,] 0.8560214 0.1439786
We can compare the transition probabilities with the steady state. If transition were just random, then transition would not depend on the previous state and they would just be equal to the steady state values.
Since that's not the case, it's clear that if you have a predator you're much more likely to have another predator and vice versa.
Upvotes: 4