Blundering Ecologist
Blundering Ecologist

Reputation: 1315

Partitioning data to determine (ordered) time between observations

I am not 100% sure how to formulate my question because I don't know the formal names are for what it is that I am trying to do with my dataset. Based on previous questions, there appears to be some way to address what I am trying to, but I am unable at making the logical jump from their problem to my own.

I have attached a sample of my data here.

The first thing I did with my data was add a column indicating which species (sps) are predators (coded as 1) and which species are prey (coded as 0).

#specify which are predators and prey
d1 = d1 %>%
    group_by(sps) %>% #grouped by species
    mutate(pp=ifelse(sps %in% c("MUXX", "MUVI","MEME"), 1,0)) #mutate to specify predators as 1 and prey as 0

My data is structured as such:

head(d1) #visualize the first few lines of the data
# A tibble: 6 x 8
# Groups:   sps [4]
 ID       date    km   culv.id   type   sps   time    pp
<int>     <fctr> <dbl>  <fctr> <fctr> <fctr> <fctr> <dbl>
1    2012-06-19    80     A      DCC   MICRO   2:19    0
2    2012-06-21    80     A      DCC   MUXX   23:23    1
3    2012-07-15    80     A      DCC   MAMO   11:38    0
4    2012-07-20    80     A      DCC   MICRO  22:19    0
5    2012-07-29    80     A      DCC   MICRO  23:03    0
6    2012-08-07    80     A      DCC   PRLO    2:04    0

Here is also the output for dput(head(d1)):

structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 8L), date = c("2012-06-19", "2012-06-21", "2012-07-15", "2012-07-20", "2012-07-29", "2012-08-07" ), km = c(80L, 80L, 80L, 80L, 80L, 80L), culv.id = c("A", "A", "A", "A", "A", "A"), type = c("DCC", "DCC", "DCC", "DCC", "DCC", "DCC"), sps = c("MICRO", "MUXX", "MAMO", "MICRO", "MICRO", "PRLO" ), time = c("2:19", "23:23", "11:38", "22:19", "23:03", "2:04" ), pp = c(0, 1, 0, 0, 0, 0)), .Names = c("ID", "date", "km", "culv.id", "type", "sps", "time", "pp"), row.names = c(NA, 6L ), class = "data.frame")

I also converted the time and date using the following code:

d1$datetime=strftime(paste(d1$date,d1$time),'%Y-%m-%d %H:%M',usetz=FALSE) #converting the date/time into a new format 

The (most) relevant columns are date, time, and pp (where 1 = predator species and 0 = prey species).

I am now trying to figure out how to extract the following information (average +/- std):

  1. average time between prey-prey observations
  2. average time between prey-predator observations
  3. average time between predator-predator observations
  4. average time between predator-prey observations

To put one of these examples (#2) into words:

What is the average time between when a prey species (pp = 0) is first seen followed by a predator species (pp = 1)?

I am trying to figure out how to do this for my dataset overall first. I think that once I figure out how to do that, it should be fairly straightforward to restrict the data.

Upvotes: 0

Views: 99

Answers (2)

Luke C
Luke C

Reputation: 10301

Here is a data.table (and lubridate) version that might address your problem:

Using a selection of your posted data (posted at bottom), with a slight modification to your datetime creation so that the format works with data.table:

d1$datetime <- as.POSIXct(strptime(paste(d1$date,d1$time),'%Y-%m-%d %H:%M'))

Convert to a data table:

d1 <- as.data.table(d1)

Calculate time differences for equal pp values for animals by specialization (prey or predator), less than (pred to prey), or greater than (prey to pred).

d1$class.class <-  d1[d1, difftime(x.datetime, i.datetime, units = "days"), 
                   on = .(datetime > datetime, pp == pp), mult = "first"]

d1$prey.pred <-d1[d1, x.datetime - i.datetime, 
                   on = .(datetime > datetime, pp > pp ), mult = "first"]

d1$pred.prey <- d1[d1, x.datetime - i.datetime, 
                   on = .(datetime > datetime, pp < pp), mult = "first"]

Gives you a column for each:

> head(d1[, 7:ncol(d1)])
    time pp            datetime    class.class      prey.pred     pred.prey
1:  2:19  0 2012-06-19 02:19:00 26.388194 days  2.877778 days       NA days
2: 23:23  1 2012-06-21 23:23:00 74.177083 days        NA days 23.51042 days
3: 11:38  0 2012-07-15 11:38:00  5.445139 days 50.666667 days       NA days
4: 22:19  0 2012-07-20 22:19:00  9.030556 days 45.221528 days       NA days
5: 23:03  0 2012-07-29 23:03:00  8.125694 days 36.190972 days       NA days
6:  2:04  0 2012-08-07 02:04:00  1.911111 days 28.065278 days       NA days

And you can get summary statistics as you like:

d1[by = sps,, .(mean.same.class = mean(class.class, na.rm = TRUE), 
                sd.same.class = sd(class.class, na.rm = TRUE),
                mean.prey.pred = mean(prey.pred, na.rm = TRUE),
                sd.prey.pred = sd(prey.pred, na.rm = TRUE),
                mean.pred.prey = mean(pred.prey, na.rm = TRUE),
                sd.pred.prey = sd(pred.prey, na.rm = TRUE))]

     sps mean.same.class sd.same.class mean.prey.pred sd.prey.pred mean.pred.prey sd.pred.prey
1: MICRO   7.886237 days     8.0547631  18.80733 days    15.504646       NaN days           NA
2:  MUXX  42.073611 days    45.4011658       NaN days           NA  13.01366 days     9.315697
3:  MAMO   5.445139 days            NA  50.66667 days           NA       NaN days           NA
4:  PRLO   2.475694 days     0.7984414  26.62708 days     2.033914       NaN days           NA
5:  LEAM   2.897222 days            NA  10.11597 days           NA       NaN days           NA

Libraries: data.table, lubridate

Data:

> dput(d1)
structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 8L, 9L, 10L, 11L, 12L, 
13L, 14L, 15200L, 15201L, 15199L, 15177L, 15178L, 15204L, 15205L
), date = c("2012-06-19", "2012-06-21", "2012-07-15", "2012-07-20", 
"2012-07-29", "2012-08-07", "2012-08-08", "2012-08-09", "2012-08-13", 
"2012-08-13", "2012-08-25", "2012-08-27", "2012-09-04", "2012-09-09", 
"2012-09-11", "2012-09-14", "2012-09-23", "2012-09-26", "2012-09-27"
), km = c(80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 
80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L), culv.id = c("A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A"), type = c("DCC", "DCC", "DCC", "DCC", "DCC", 
"DCC", "DCC", "DCC", "DCC", "DCC", "DCC", "DCC", "DCC", "DCC", 
"DCC", "DCC", "DCC", "DCC", "DCC"), sps = c("MICRO", "MUXX", 
"MAMO", "MICRO", "MICRO", "PRLO", "MICRO", "PRLO", "MICRO", "MICRO", 
"LEAM", "MICRO", "MUXX", "MICRO", "MICRO", "MUXX", "MICRO", "MICRO", 
"MICRO"), time = c("2:19", "23:23", "11:38", "22:19", "23:03", 
"2:04", "23:56", "23:06", "0:04", "0:46", "0:51", "22:23", "3:38", 
"21:08", "0:40", "2:55", "22:09", "20:46", "3:20"), pp = c(0, 
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0)), class = "data.frame", .Names = c("ID", 
"date", "km", "culv.id", "type", "sps", "time", "pp"), row.names = c(NA, 
-19L))

Edit:

I'm not really sure about mixing tidyverse and data.table ideologies, but you could potentially do what you described in comments using do. For example, make a modified version of df:

d1 <- as.data.table(d1)
d1$datetime <- as.POSIXct(strptime(paste(d1$date,d1$time),'%Y-%m-%d %H:%M'))
d1Mod <- d1
d1Mod$km[10:nrow(d1Mod)] <- 90

Then, define the data.table bit as a function:

foo <- function(df_) {
    df_$class.class <-  df_[df_, difftime(x.datetime, i.datetime, units = "days"),
                          on = .(datetime > datetime, pp == pp), mult = "first"]

    df_$prey.pred <-df_[df_, x.datetime - i.datetime,
                      on = .(datetime > datetime, pp > pp ), mult = "first"]

    df_$pred.prey <- df_[df_, x.datetime - i.datetime,
                       on = .(datetime > datetime, pp < pp), mult = "first"]

    return(df_)
}

Running d1 %>% group_by(km) %>% do(foo(as.data.table(.))) gets you the same output as in the original answer above (since all km values are 80). If you run it on the modified d1Mod you get an output that looks like it has been grouped by km:

> d1Mod %>% 
+     group_by(km) %>%
+     do(foo(as.data.table(.)))
# A tibble: 19 x 12
# Groups:   km [2]
      ID date          km culv.id type  sps   time     pp datetime            class.class     prey.pred      pred.prey     
   <int> <chr>      <dbl> <chr>   <chr> <chr> <chr> <dbl> <dttm>              <time>          <time>         <time>        
 1     1 2012-06-19    80 A       DCC   MICRO 2:19      0 2012-06-19 02:19:00 26.3881944 days  2.877778 days        NA days
 2     2 2012-06-21    80 A       DCC   MUXX  23:23     1 2012-06-21 23:23:00         NA days        NA days 23.510417 days
 3     3 2012-07-15    80 A       DCC   MAMO  11:38     0 2012-07-15 11:38:00  5.4451389 days        NA days        NA days
 4     4 2012-07-20    80 A       DCC   MICRO 22:19     0 2012-07-20 22:19:00  9.0305556 days        NA days        NA days
 5     5 2012-07-29    80 A       DCC   MICRO 23:03     0 2012-07-29 23:03:00  8.1256944 days        NA days        NA days
 6     8 2012-08-07    80 A       DCC   PRLO  2:04      0 2012-08-07 02:04:00  1.9111111 days        NA days        NA days
 7     9 2012-08-08    80 A       DCC   MICRO 23:56     0 2012-08-08 23:56:00  0.9652778 days        NA days        NA days
 8    10 2012-08-09    80 A       DCC   PRLO  23:06     0 2012-08-09 23:06:00  3.0402778 days        NA days        NA days
 9    11 2012-08-13    80 A       DCC   MICRO 0:04      0 2012-08-13 00:04:00         NA days        NA days        NA days
10    12 2012-08-13    90 A       DCC   MICRO 0:46      0 2012-08-13 00:46:00 12.0034722 days 22.119444 days        NA days
11    13 2012-08-25    90 A       DCC   LEAM  0:51      0 2012-08-25 00:51:00  2.8972222 days 10.115972 days        NA days
12    14 2012-08-27    90 A       DCC   MICRO 22:23     0 2012-08-27 22:23:00 12.9479167 days  7.218750 days        NA days
13 15200 2012-09-04    90 A       DCC   MUXX  3:38      1 2012-09-04 03:38:00  9.9701389 days        NA days  5.729167 days
14 15201 2012-09-09    90 A       DCC   MICRO 21:08     0 2012-09-09 21:08:00  1.1472222 days  4.240972 days        NA days
15 15199 2012-09-11    90 A       DCC   MICRO 0:40      0 2012-09-11 00:40:00 12.8951389 days  3.093750 days        NA days
16 15177 2012-09-14    90 A       DCC   MUXX  2:55      1 2012-09-14 02:55:00         NA days        NA days  9.801389 days
17 15178 2012-09-23    90 A       DCC   MICRO 22:09     0 2012-09-23 22:09:00  2.9423611 days        NA days        NA days
18 15204 2012-09-26    90 A       DCC   MICRO 20:46     0 2012-09-26 20:46:00  0.2736111 days        NA days        NA days
19 15205 2012-09-27    90 A       DCC   MICRO 3:20      0 2012-09-27 03:20:00         NA days        NA days        NA days

However, you'll have to do some checking to make sure that the calculations are actually doing what you need- I don't have example output or actual km/year info to truth these results against (read: I don't know what I'm looking at!).

Note also that I think arrange is irrelevant for the operations here, considering that the datetime gets sorted in the function.

Upvotes: 2

VFreguglia
VFreguglia

Reputation: 2311

I'll use the piece on the comments as an example:

d1 = structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 8L), date = c("2012-06-19", "2012-06-21", "2012-07-15", "2012-07-20", "2012-07-29", "2012-08-07" ), km = c(80L, 80L, 80L, 80L, 80L, 80L), culv.id = c("A", "A", "A", "A", "A", "A"), type = c("DCC", "DCC", "DCC", "DCC", "DCC", "DCC"), sps = c("MICRO", "MUXX", "MAMO", "MICRO", "MICRO", "PRLO" ), time = c("2:19", "23:23", "11:38", "22:19", "23:03", "2:04" ), pp = c(0, 1, 0, 0, 0, 0)), .Names = c("ID", "date", "km", "culv.id", "type", "sps", "time", "pp"), row.names = c(NA, 6L ), class = "data.frame")

We add the datetime column just as you specified:

d1$datetime=strftime(paste(d1$date,d1$time),'%Y-%m-%d %H:%M',usetz=FALSE)

First, add a column indicating which sequence of happened prey/predator and the time between observations (we remove the first row because there is no information about the previous observation). Note that, the timedif is a numerical value indicating the number of days.

d1 = d1 %>% mutate(prev = lag(pp))
d1 = d1 %>% mutate(timedif = as.numeric(as.POSIXct(datetime) - lag(as.POSIXct(datetime))))
d1 = d1[2:nrow(d1),] %>% mutate(seque = as.factor(paste0(pp,prev)))

At this point, your table looks like

> d1
  ID       date km culv.id type   sps  time pp         datetime prev   timedif seque
1  2 2012-06-21 80       A  DCC  MUXX 23:23  1 2012-06-21 23:23    0  2.877778    10
2  3 2012-07-15 80       A  DCC  MAMO 11:38  0 2012-07-15 11:38    1 23.510417    01
3  4 2012-07-20 80       A  DCC MICRO 22:19  0 2012-07-20 22:19    0  5.445139    00
4  5 2012-07-29 80       A  DCC MICRO 23:03  0 2012-07-29 23:03    0  9.030556    00
5  8 2012-08-07 80       A  DCC  PRLO  2:04  0 2012-08-07 02:04    0  8.125694    00

After that, just take the wanted statistics for each group by using

avg = d1 %>% group_by(seque) %>% summarise(mean(timedif))
sdevs = d1 %>% group_by(seque) %>% summarise(sd(timedif))

We obtain

>avg
# A tibble: 3 x 2
   seque `mean(timedif)`
  <fctr>           <dbl>
1     00        7.533796
2     01       23.510417
3     10        2.877778

> sdevs
# A tibble: 3 x 2
   seque `sd(timedif)`
  <fctr>         <dbl>
1     00      1.864554
2     01            NA
3     10            NA

Note that the standard deviation is not computed because we only have one observation in the sample dataset for these categories.

Upvotes: 1

Related Questions