kalex
kalex

Reputation: 83

Determine a period between consecutive events

I have wildlife camera trap data. Often one animal will trigger a camera repeatedly if it remains in it's frame for a long period of time. I would like to identify when this occurs.

If there are consecutive events (rows) with less than 5 minutes between (within a date), I assume it is one animal. I would like to choose one row and discard the rest. I would also like to group by site. Here is an example of my data and the desired outcome.

Current data:

tibble::tribble(
  ~date, ~time, ~site,
    "24/08/2019", "14:44",  "A",
    "24/08/2019", "14:45",  "A",
    "24/08/2019", "14:46",  "A",
    "24/08/2019", "14:50",  "A",
    "24/08/2019", "14:47",  "B",
    "24/08/2019", "14:48",  "B",
    "24/08/2019", "17:14",  "B",
    "24/08/2019", "17:18",  "B",
    "24/08/2019", "20:04",  "B",
    "25/08/2019", "14:42",  "A"
  )
date       time   site           
24/08/2019 14:44  A                        
24/08/2019 14:45  A                        
24/08/2019 14:46  A
24/08/2019 14:50  A           
24/08/2019 14:47  B                        
24/08/2019 14:48  B
24/08/2019 17:14  B
24/08/2019 17:18  B
24/08/2019 20:04  B
25/08/2019 14:42  A

Desired outcome:

date       time   site           
24/08/2019 14:44  A                                      
24/08/2019 14:47  B                        
24/08/2019 17:14  B
24/08/2019 20:04  B
25/08/2019 14:42  A

Thank you in advance!

Upvotes: 0

Views: 169

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269461

Using the data shown reproducibly in the Note at the end sort the data by site and datetime and append a diff column showing the difference in time between successive rows in the same site giving DFs and from that we can derive a membership column which assigns a unique number to each set of rows which are near each other by using cumsum(diff >= 5). We then choose the first row in each group.

library(dplyr)

DFs <- DF %>%
  arrange(site, datetime) %>%
  group_by(site) %>%
  mutate(diff = c(Inf, as.numeric(diff(datetime), units = "mins"))) %>%
  ungroup 

DFs %>%
  group_by(membership = cumsum(diff >=5)) %>%
  slice(1) %>%
  ungroup
## # A tibble: 5 x 6
##   date       time  site  datetime             diff membership
##   <chr>      <chr> <chr> <dttm>              <dbl>      <int>
## 1 24/08/2019 14:44 A     2019-08-24 14:44:00   Inf          1
## 2 25/08/2019 14:42 A     2019-08-25 14:42:00  1432          2
## 3 24/08/2019 14:47 B     2019-08-24 14:47:00   Inf          3
## 4 24/08/2019 17:14 B     2019-08-24 17:14:00   146          4
## 5 24/08/2019 20:04 B     2019-08-24 20:04:00   166          5

Another approach is to create an igraph g (see diagram at end) with one vertex per row having an edge between successive rows that are less than 5 apart. The connected components of that graph can be used to form membership and then we proceed as above.

library(igraph)

nr <- nrow(DFs)
g <- make_empty_graph(n = nr)
wx <- which(DFs$diff < 5)
g <- add_edges(g, c(rbind(wx - 1, wx)))
plot(g) # see plot at end

DFs$membership <- components(g)$membership

DFs %>%
  group_by(membership) %>%
  slice(1) %>%
  ungroup

screenshot

Note

Lines <- "
date       time   site           
24/08/2019 14:44  A                        
24/08/2019 14:45  A                        
24/08/2019 14:46  A
24/08/2019 14:50  A           
24/08/2019 14:47  B                        
24/08/2019 14:48  B
24/08/2019 17:14  B
24/08/2019 17:18  B
24/08/2019 20:04  B
25/08/2019 14:42  A"
DF <- read.table(text = Lines, header = TRUE)
DF$datetime <- as.POSIXct(paste(DF$date, DF$time), format = "%d/%m/%Y %H:%M")

Upvotes: 1

Related Questions