Partitioning data to determine (ordered) time between observations

Question

I am not 100% sure how to formulate my question because I don't know the formal names are for what it is that I am trying to do with my dataset. Based on previous questions, there appears to be some way to address what I am trying to, but I am unable at making the logical jump from their problem to my own.

I have attached a sample of my data here.

The first thing I did with my data was add a column indicating which species (sps) are predators (coded as 1) and which species are prey (coded as 0).

#specify which are predators and prey
d1 = d1 %>%
    group_by(sps) %>% #grouped by species
    mutate(pp=ifelse(sps %in% c("MUXX", "MUVI","MEME"), 1,0)) #mutate to specify predators as 1 and prey as 0

My data is structured as such:

head(d1) #visualize the first few lines of the data
# A tibble: 6 x 8
# Groups:   sps [4]
 ID       date    km   culv.id   type   sps   time    pp
            
1    2012-06-19    80     A      DCC   MICRO   2:19    0
2    2012-06-21    80     A      DCC   MUXX   23:23    1
3    2012-07-15    80     A      DCC   MAMO   11:38    0
4    2012-07-20    80     A      DCC   MICRO  22:19    0
5    2012-07-29    80     A      DCC   MICRO  23:03    0
6    2012-08-07    80     A      DCC   PRLO    2:04    0

Here is also the output for dput(head(d1)):

structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 8L), date = c("2012-06-19", "2012-06-21", "2012-07-15", "2012-07-20", "2012-07-29", "2012-08-07" ), km = c(80L, 80L, 80L, 80L, 80L, 80L), culv.id = c("A", "A", "A", "A", "A", "A"), type = c("DCC", "DCC", "DCC", "DCC", "DCC", "DCC"), sps = c("MICRO", "MUXX", "MAMO", "MICRO", "MICRO", "PRLO" ), time = c("2:19", "23:23", "11:38", "22:19", "23:03", "2:04" ), pp = c(0, 1, 0, 0, 0, 0)), .Names = c("ID", "date", "km", "culv.id", "type", "sps", "time", "pp"), row.names = c(NA, 6L ), class = "data.frame")

I also converted the time and date using the following code:

d1$datetime=strftime(paste(d1$date,d1$time),'%Y-%m-%d %H:%M',usetz=FALSE) #converting the date/time into a new format

The (most) relevant columns are date, time, and pp (where 1 = predator species and 0 = prey species).

I am now trying to figure out how to extract the following information (average +/- std):

average time between prey-prey observations
average time between prey-predator observations
average time between predator-predator observations
average time between predator-prey observations

To put one of these examples (#2) into words:

What is the average time between when a prey species (pp = 0) is first seen followed by a predator species (pp = 1)?

I am trying to figure out how to do this for my dataset overall first. I think that once I figure out how to do that, it should be fairly straightforward to restrict the data.

VFreguglia · Accepted Answer

I'll use the piece on the comments as an example:

d1 = structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 8L), date = c("2012-06-19", "2012-06-21", "2012-07-15", "2012-07-20", "2012-07-29", "2012-08-07" ), km = c(80L, 80L, 80L, 80L, 80L, 80L), culv.id = c("A", "A", "A", "A", "A", "A"), type = c("DCC", "DCC", "DCC", "DCC", "DCC", "DCC"), sps = c("MICRO", "MUXX", "MAMO", "MICRO", "MICRO", "PRLO" ), time = c("2:19", "23:23", "11:38", "22:19", "23:03", "2:04" ), pp = c(0, 1, 0, 0, 0, 0)), .Names = c("ID", "date", "km", "culv.id", "type", "sps", "time", "pp"), row.names = c(NA, 6L ), class = "data.frame")

We add the datetime column just as you specified:

d1$datetime=strftime(paste(d1$date,d1$time),'%Y-%m-%d %H:%M',usetz=FALSE)

First, add a column indicating which sequence of happened prey/predator and the time between observations (we remove the first row because there is no information about the previous observation). Note that, the timedif is a numerical value indicating the number of days.

d1 = d1 %>% mutate(prev = lag(pp))
d1 = d1 %>% mutate(timedif = as.numeric(as.POSIXct(datetime) - lag(as.POSIXct(datetime))))
d1 = d1[2:nrow(d1),] %>% mutate(seque = as.factor(paste0(pp,prev)))

At this point, your table looks like

> d1
  ID       date km culv.id type   sps  time pp         datetime prev   timedif seque
1  2 2012-06-21 80       A  DCC  MUXX 23:23  1 2012-06-21 23:23    0  2.877778    10
2  3 2012-07-15 80       A  DCC  MAMO 11:38  0 2012-07-15 11:38    1 23.510417    01
3  4 2012-07-20 80       A  DCC MICRO 22:19  0 2012-07-20 22:19    0  5.445139    00
4  5 2012-07-29 80       A  DCC MICRO 23:03  0 2012-07-29 23:03    0  9.030556    00
5  8 2012-08-07 80       A  DCC  PRLO  2:04  0 2012-08-07 02:04    0  8.125694    00

After that, just take the wanted statistics for each group by using

avg = d1 %>% group_by(seque) %>% summarise(mean(timedif))
sdevs = d1 %>% group_by(seque) %>% summarise(sd(timedif))

We obtain

>avg
# A tibble: 3 x 2
   seque `mean(timedif)`
             
1     00        7.533796
2     01       23.510417
3     10        2.877778

> sdevs
# A tibble: 3 x 2
   seque `sd(timedif)`
           
1     00      1.864554
2     01            NA
3     10            NA

Note that the standard deviation is not computed because we only have one observation in the sample dataset for these categories.

Partitioning data to determine (ordered) time between observations

Answers (2)

Related Questions