Reputation: 35
I have a large ecological data set that includes deployment IDs that need fixing.
Camera traps were deployed over a ten year period and the location of the camera trap is recorded as the deployment ID. However, camera traps may have been deployed in the same location twice, at different times. In this case, the deployment ID has been recorded as the same but they need to be unique.
And to clarify: Only one camera trap can be deployed to a location at a time. But over a ten year period, some locations had multiple camera traps.
So my goal is to make them unique deployment IDs.
In my attempt: I know that camera traps were not deployed for longer than 3 months. So, using the DateTime, if more than 100 days have passed I can assume it is a redeployment and I want to assign it a unique deployment ID. So I created a column with DateDiff
for (i in 2:nrow(data)){
if (data$deploymentID[i] == data$deploymentID[i-1]){
data$DateDiff[i] <- as.Date(data$DateTime[i], format = '%Y%m%d') - as.Date(data$DateTime[i-1], format = '%Y%m%d')
}
}
Then, I thought that if I order the data, first by deployment ID, then by DateTime...
data[order(data$deploymentID, data$DateTime), ]
I would be able to use a loop that runs through the dataframe and if DateDiff is > 100, all following lines where deployment ID is the same, create unique deploymentID.
However, I am struggling to create a working loop for this last section!
Many thanks
Upvotes: 1
Views: 76
Reputation: 10375
Here is a solution with dplyr
. First group by the ID
and arrange the dates. You can then calculate the date difference with the lag
function. Afterwards, you calculate with cumsum
how often the deployment differs by more than 100 days and use this to generate a unique ID:
data <- data.frame(ID = c(rep("B1-2-41", 8), rep("B1-2-44", 8)),
date = c(rep(as.Date("2019-05-01"), 8),
rep(as.Date("2018-01-01"), 6),
as.Date("2018-05-01"), as.Date("2018-10-01")))
library(dplyr)
data_new <- data %>%
group_by(ID) %>%
arrange(date) %>%
mutate(date_diff = date - lag(date, default = first(date)),
unique_ID = paste0(ID, "_", cumsum(date_diff > 100) + 1)) %>%
ungroup() %>%
arrange(ID)
> data_new
# A tibble: 16 x 4
ID date date_diff unique_ID
<fct> <date> <drtn> <chr>
1 B1-2-41 2019-05-01 0 days B1-2-41_1
2 B1-2-41 2019-05-01 0 days B1-2-41_1
3 B1-2-41 2019-05-01 0 days B1-2-41_1
4 B1-2-41 2019-05-01 0 days B1-2-41_1
5 B1-2-41 2019-05-01 0 days B1-2-41_1
6 B1-2-41 2019-05-01 0 days B1-2-41_1
7 B1-2-41 2019-05-01 0 days B1-2-41_1
8 B1-2-41 2019-05-01 0 days B1-2-41_1
9 B1-2-44 2018-01-01 0 days B1-2-44_1
10 B1-2-44 2018-01-01 0 days B1-2-44_1
11 B1-2-44 2018-01-01 0 days B1-2-44_1
12 B1-2-44 2018-01-01 0 days B1-2-44_1
13 B1-2-44 2018-01-01 0 days B1-2-44_1
14 B1-2-44 2018-01-01 0 days B1-2-44_1
15 B1-2-44 2018-05-01 120 days B1-2-44_2
16 B1-2-44 2018-10-01 153 days B1-2-44_3
Upvotes: 1