Reputation: 83
I have wildlife camera trap data. Often one animal will trigger a camera repeatedly if it remains in it's frame for a long period of time. I would like to identify when this occurs.
If there are consecutive events (rows) with less than 5 minutes between (within a date), I assume it is one animal. I would like to choose one row and discard the rest. I would also like to group by site. Here is an example of my data and the desired outcome.
Current data:
tibble::tribble(
~date, ~time, ~site,
"24/08/2019", "14:44", "A",
"24/08/2019", "14:45", "A",
"24/08/2019", "14:46", "A",
"24/08/2019", "14:50", "A",
"24/08/2019", "14:47", "B",
"24/08/2019", "14:48", "B",
"24/08/2019", "17:14", "B",
"24/08/2019", "17:18", "B",
"24/08/2019", "20:04", "B",
"25/08/2019", "14:42", "A"
)
date time site
24/08/2019 14:44 A
24/08/2019 14:45 A
24/08/2019 14:46 A
24/08/2019 14:50 A
24/08/2019 14:47 B
24/08/2019 14:48 B
24/08/2019 17:14 B
24/08/2019 17:18 B
24/08/2019 20:04 B
25/08/2019 14:42 A
Desired outcome:
date time site
24/08/2019 14:44 A
24/08/2019 14:47 B
24/08/2019 17:14 B
24/08/2019 20:04 B
25/08/2019 14:42 A
Thank you in advance!
Upvotes: 0
Views: 169
Reputation: 269461
Using the data shown reproducibly in the Note at the end sort the data by site and datetime and append a diff
column showing the difference in time between successive rows in the same site giving DFs and from that we can derive a membership column which assigns a unique number to each set of rows which are near each other by using cumsum(diff >= 5)
. We then choose the first row in each group.
library(dplyr)
DFs <- DF %>%
arrange(site, datetime) %>%
group_by(site) %>%
mutate(diff = c(Inf, as.numeric(diff(datetime), units = "mins"))) %>%
ungroup
DFs %>%
group_by(membership = cumsum(diff >=5)) %>%
slice(1) %>%
ungroup
## # A tibble: 5 x 6
## date time site datetime diff membership
## <chr> <chr> <chr> <dttm> <dbl> <int>
## 1 24/08/2019 14:44 A 2019-08-24 14:44:00 Inf 1
## 2 25/08/2019 14:42 A 2019-08-25 14:42:00 1432 2
## 3 24/08/2019 14:47 B 2019-08-24 14:47:00 Inf 3
## 4 24/08/2019 17:14 B 2019-08-24 17:14:00 146 4
## 5 24/08/2019 20:04 B 2019-08-24 20:04:00 166 5
Another approach is to create an igraph g
(see diagram at end) with one vertex per row having an edge between successive rows that are less than 5 apart. The connected components of that graph can be used to form membership
and then we proceed as above.
library(igraph)
nr <- nrow(DFs)
g <- make_empty_graph(n = nr)
wx <- which(DFs$diff < 5)
g <- add_edges(g, c(rbind(wx - 1, wx)))
plot(g) # see plot at end
DFs$membership <- components(g)$membership
DFs %>%
group_by(membership) %>%
slice(1) %>%
ungroup
Lines <- "
date time site
24/08/2019 14:44 A
24/08/2019 14:45 A
24/08/2019 14:46 A
24/08/2019 14:50 A
24/08/2019 14:47 B
24/08/2019 14:48 B
24/08/2019 17:14 B
24/08/2019 17:18 B
24/08/2019 20:04 B
25/08/2019 14:42 A"
DF <- read.table(text = Lines, header = TRUE)
DF$datetime <- as.POSIXct(paste(DF$date, DF$time), format = "%d/%m/%Y %H:%M")
Upvotes: 1