How to calculate number of events during specific time period

Question

I am trying to calculate the number of events (each row is an event) in "df2" within a time period defined by "df1". I am able to do this for the entire time period approximately 5 mins, however I would like to break the time period into smaller chunks (1 min) and do the same calculation

df1<- structure(list(Location = 1:10, Lattitude = c(57.140532, 57.140527, 
57.13959, 57.13974, 57.14059, 57.14058, 57.1398, 57.13989, 57.14158, 
57.14386), t_in = structure(c(1455626730, 1455627326, 1455628122, 
1455628644, 1455629174, 1455629708, 1455630230, 1455630765, 1455631396, 
1455631931), class = c("POSIXct", "POSIXt"), tzone = ""), t_out = structure(c(1455627047, 
1455627615, 1455628462, 1455628933, 1455629486, 1455630015, 1455630552, 
1455631070, 1455631719, 1455632242), class = c("POSIXct", "POSIXt"
), tzone = "")), .Names = c("Location", "Lattitude", "t_in", 
"t_out"), class = "data.frame", row.names = c(NA, -10L))

df2<- structure(list(date.time = structure(c(1455630964, 1455630976, 
1455630987, 1455630998, 1455631009, 1455631021, 1455631032, 1455631043, 
1455631054, 1455631066, 1455631077, 1455631088, 1455631099, 1455631111, 
1455631423, 1455631446, 1455631479, 1455631502, 1455631569, 1455631772
), class = c("POSIXct", "POSIXt"), tzone = ""), code = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L), .Label = c("1003", "32221"), class = "factor"), 
rec_id = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("301976", 
"301978", "301985", "301988"), class = "factor"), Lattitude = c("57.14066", 
"57.14066", "57.14066", "57.14066", "57.14066", "57.14066", 
"57.14066", "57.14066", "57.14066", "57.14066", "57.14066", 
"57.14066", "57.14066", "57.14066", "57.141869", "57.141869", 
"57.141869", "57.141869", "57.141869", "57.141869"), Longitude = c("2.075702", 
"2.075702", "2.075702", "2.075702", "2.075702", "2.075702", 
"2.075702", "2.075702", "2.075702", "2.075702", "2.075702", 
"2.075702", "2.075702", "2.075702", "2.081576", "2.081576", 
"2.081576", "2.081576", "2.081576", "2.081576"), Location = list(
    8, 8, 8, 8, 8, 8, 8, 8, 8, 8, NA, NA, NA, NA, 9, 9, 9, 
    9, 9, NA)), .Names = c("date.time", "code", "rec_id", 
"Lattitude", "Longitude", "Location"), row.names = 94:113, class = "data.frame")

Function returns the location from df1 if the date.time in df2 lies between df1$t_in and df1$t_out. This may seem a round about way but enables later calculations outwith this code

ids <- as.numeric(df1$Location)
f <- function(x){
  a <- ids[ (df1$t_in < x) & (x < df1$t_out) ]
  if (length(a) == 0) NA else a
}   

df2$Location <- lapply(df2$date.time, f)

the above returns a list, so need to turn it into numeric. bit of a faff but cant get round it

df2$Location<- paste(df2$Location)
df2$Location<- as.numeric(df2$Location)

NA's are then removed as these lie outside the time periods defined in df1 and thus irrelevant.

df2<-df2[!is.na(df2$Location),]

Then calculate number of events (i.e. each row)for each rec_id and Location

library (plyr)
df3 <- ddply(df2, c("rec_id","Location"), function(df){data.frame (detections=nrow(df))})

  rec_id Location detections
1 301976        9          5
2 301978        8         10

...perfect!

however I would like to do this for smaller time periods. Every minute to be exact. and the period should start from the t_in (df1)at each location through until t_out (df1). I can do this with a lot of work in excel but surely it can be automated in R (it is a large dataset).

so ultimately i can count the number of events(nrow) at each location for each 1 minute time period between t_in and t_out in df1

such as (just visual example not actual data):

  rec_id Location  minute(or period) detections
 301976        9             1           1
 301976        9             2           2
 301976        9             3           0
 301976        9             4           0
 301976        9             5           2
 301978        8             1           4
 301978        8             2           3
 301978        8             3           1
 301978        8             4           0
 301978        8             5           2

i can create the intervals from the first location but im not sure how to apply this further

seq(from = head(df1$t_in,1), to = head(df1$t_out,1) , by = "mins")

ekstroem · Accepted Answer

I think the following can be used to generate a new df1 data frame with the sequences split output and then you can apply the steps you go through above with the new df1.

They can possibly be combined but I just wanted to make sure it actually gets you what you want.

First we expand the time intervals in your original data frame and produce a list of the expanded periods. Each row in df1 becomes an element in a list.

res1 <- sapply(1:nrow(df1), function(i) {
                 seq(from = df1$t_in[i], to = df1$t_out[i] , by = "mins")})

Then we convert the list of sequences to a data frame (two columns)

res2 <- lapply(res1, function(x) { 
                 data.frame(t_in = x[1:(length(x)-1)], t_out=x[2:length(x)]) })

And finally we merge everything together

df1v2 <- Reduce(function(...) merge(..., all=T), res2)

Then (tweaking your code)

ids <- seq_len(nrow(df1v2))
f <- function(x){
  a <- ids[ (df1v2$t_in < x) & (x < df1v2$t_out) ]
  if (length(a) == 0) NA else a
}   

df2$Location <- lapply(df2$date.time, f)

which yields

              date.time  code rec_id Lattitude Longitude Location
94  2016-02-16 14:56:04 32221 301978  57.14066  2.075702       37
95  2016-02-16 14:56:16 32221 301978  57.14066  2.075702       37
96  2016-02-16 14:56:27 32221 301978  57.14066  2.075702       37
97  2016-02-16 14:56:38 32221 301978  57.14066  2.075702       37
98  2016-02-16 14:56:49 32221 301978  57.14066  2.075702       38
99  2016-02-16 14:57:01 32221 301978  57.14066  2.075702       38
100 2016-02-16 14:57:12 32221 301978  57.14066  2.075702       38
101 2016-02-16 14:57:23 32221 301978  57.14066  2.075702       38
102 2016-02-16 14:57:34 32221 301978  57.14066  2.075702       38
103 2016-02-16 14:57:46 32221 301978  57.14066  2.075702       NA
104 2016-02-16 14:57:57 32221 301978  57.14066  2.075702       NA
105 2016-02-16 14:58:08 32221 301978  57.14066  2.075702       NA
106 2016-02-16 14:58:19 32221 301978  57.14066  2.075702       NA
107 2016-02-16 14:58:31 32221 301978  57.14066  2.075702       NA
108 2016-02-16 15:03:43 32221 301976 57.141869  2.081576       39
109 2016-02-16 15:04:06 32221 301976 57.141869  2.081576       39
110 2016-02-16 15:04:39 32221 301976 57.141869  2.081576       40
111 2016-02-16 15:05:02 32221 301976 57.141869  2.081576       40
112 2016-02-16 15:06:09 32221 301976 57.141869  2.081576       41
113 2016-02-16 15:09:32 32221 301976 57.141869  2.081576       NA

I'm not sure if the boundary checks are correct (modify f) but it looks as if you get whet you are after. How important is a speed-up?

How to calculate number of events during specific time period

Answers (1)

Related Questions