can I solve this with a loop and how?

Question

I have a large ecological data set that includes deployment IDs that need fixing.

Camera traps were deployed over a ten year period and the location of the camera trap is recorded as the deployment ID. However, camera traps may have been deployed in the same location twice, at different times. In this case, the deployment ID has been recorded as the same but they need to be unique.

And to clarify: Only one camera trap can be deployed to a location at a time. But over a ten year period, some locations had multiple camera traps.

So my goal is to make them unique deployment IDs.

In my attempt: I know that camera traps were not deployed for longer than 3 months. So, using the DateTime, if more than 100 days have passed I can assume it is a redeployment and I want to assign it a unique deployment ID. So I created a column with DateDiff

for (i in 2:nrow(data)){
    if (data$deploymentID[i] == data$deploymentID[i-1]){
        data$DateDiff[i] <- as.Date(data$DateTime[i], format = '%Y%m%d') - as.Date(data$DateTime[i-1], format = '%Y%m%d')
    }
}

Then, I thought that if I order the data, first by deployment ID, then by DateTime...

data[order(data$deploymentID, data$DateTime), ]

I would be able to use a loop that runs through the dataframe and if DateDiff is > 100, all following lines where deployment ID is the same, create unique deploymentID.

However, I am struggling to create a working loop for this last section!

I've attached a picture to show the structure of my data

Many thanks

starja · Accepted Answer

Here is a solution with dplyr. First group by the ID and arrange the dates. You can then calculate the date difference with the lag function. Afterwards, you calculate with cumsum how often the deployment differs by more than 100 days and use this to generate a unique ID:

data <- data.frame(ID = c(rep("B1-2-41", 8), rep("B1-2-44", 8)),
                   date = c(rep(as.Date("2019-05-01"), 8),
                            rep(as.Date("2018-01-01"), 6),
                            as.Date("2018-05-01"), as.Date("2018-10-01")))
library(dplyr)

data_new <- data %>% 
  group_by(ID) %>% 
  arrange(date) %>% 
  mutate(date_diff = date - lag(date, default = first(date)),
         unique_ID = paste0(ID, "_", cumsum(date_diff > 100) + 1)) %>% 
  ungroup() %>% 
  arrange(ID)

> data_new
# A tibble: 16 x 4
   ID      date       date_diff unique_ID
                   
 1 B1-2-41 2019-05-01   0 days  B1-2-41_1
 2 B1-2-41 2019-05-01   0 days  B1-2-41_1
 3 B1-2-41 2019-05-01   0 days  B1-2-41_1
 4 B1-2-41 2019-05-01   0 days  B1-2-41_1
 5 B1-2-41 2019-05-01   0 days  B1-2-41_1
 6 B1-2-41 2019-05-01   0 days  B1-2-41_1
 7 B1-2-41 2019-05-01   0 days  B1-2-41_1
 8 B1-2-41 2019-05-01   0 days  B1-2-41_1
 9 B1-2-44 2018-01-01   0 days  B1-2-44_1
10 B1-2-44 2018-01-01   0 days  B1-2-44_1
11 B1-2-44 2018-01-01   0 days  B1-2-44_1
12 B1-2-44 2018-01-01   0 days  B1-2-44_1
13 B1-2-44 2018-01-01   0 days  B1-2-44_1
14 B1-2-44 2018-01-01   0 days  B1-2-44_1
15 B1-2-44 2018-05-01 120 days  B1-2-44_2
16 B1-2-44 2018-10-01 153 days  B1-2-44_3

can I solve this with a loop and how?

Answers (1)

Related Questions