Replacing values in a column between two given times

Question

I want to replace values in a specific column that are between two times. I know the min and max times for the values and want to replace all data points between these two times with a specific label.

I have a large dataset with many groups of data, so I will try to make a simple example here. Say that I wanted to replace "almost peak" with "peak" and I know the min/max times when these labels occurred.

points <- c(1,2,3,3,4,3,2,1,11,12,13,14,13,13,12,11)

Status <- c("base", "base", "almost peak", "almost peak", "peak", "almost peak", "base", "base", "base", "base", "almost peak", "peak", "almost peak", "almost peak", "base", "base")

DateTime <- seq(from = as.POSIXct("2021-10-16 11:37:23"), to = as.POSIXct("2021-10-16 11:37:38"), by = "sec")

Group <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)

df <- data.frame(points, Status, DateTime, Group)

#for getting the min and max times of "almost peak" occurrences 
df.test <- df %>% group_by(Group) %>% 
  filter(Status == "almost peak") %>%
  summarise(
    MinTime = min(DateTime),
    MaxTime = max(DateTime)
  )

>print(df)
      points   Status       DateTime      Group
1       1        base 2021-10-16 11:37:23     1
2       2        base 2021-10-16 11:37:24     1
3       3 almost peak 2021-10-16 11:37:25     1
4       3 almost peak 2021-10-16 11:37:26     1
5       4        peak 2021-10-16 11:37:27     1
6       3 almost peak 2021-10-16 11:37:28     1
7       2        base 2021-10-16 11:37:29     1
8       1        base 2021-10-16 11:37:30     1
9      11        base 2021-10-16 11:37:31     2
10     12        base 2021-10-16 11:37:32     2
11     13 almost peak 2021-10-16 11:37:33     2
12     14        peak 2021-10-16 11:37:34     2
13     13 almost peak 2021-10-16 11:37:35     2
14     13 almost peak 2021-10-16 11:37:36     2
15     12        base 2021-10-16 11:37:37     2
16     11        base 2021-10-16 11:37:38     2

Again, I want to replace all data points that are between MinTime and MaxTime with "peak" for each Group. I have tried using mutate() with replace() as shown below, but it doesn't seem to be working. I think this is close, but it's not quite right.

df.test.replace <- df %>%
  group_by(Group) %>%
  mutate(Status = replace(Status, DateTime >= df.test$MinTime & DateTime <= df.test$MaxTime, "peak"))

As a clarification, this is my desired output. Where all the status labels between the min/max time have been replaced with "peak"

      points   Status       DateTime      Group
1       1        base 2021-10-16 11:37:23     1
2       2        base 2021-10-16 11:37:24     1
3       3        peak 2021-10-16 11:37:25     1
4       3        peak 2021-10-16 11:37:26     1
5       4        peak 2021-10-16 11:37:27     1
6       3        peak 2021-10-16 11:37:28     1
7       2        base 2021-10-16 11:37:29     1
8       1        base 2021-10-16 11:37:30     1
9      11        base 2021-10-16 11:37:31     2
10     12        base 2021-10-16 11:37:32     2
11     13        peak 2021-10-16 11:37:33     2
12     14        peak 2021-10-16 11:37:34     2
13     13        peak 2021-10-16 11:37:35     2
14     13        peak 2021-10-16 11:37:36     2
15     12        base 2021-10-16 11:37:37     2
16     11        base 2021-10-16 11:37:38     2

Any pointers would be appreciated. Thank you.

Jose · Accepted Answer

You need to index the right values to make the replacement. Try using case_when from dplyr:

library(dplyr)
df %>%
    group_by(Group) %>%
    mutate(Status = case_when(
               DateTime >= df.test$MinTime[1] &
               DateTime <= df.test$MaxTime[1] ~ "peak",
               DateTime >= df.test$MinTime[2] &
               DateTime <= df.test$MaxTime[2] ~ "peak",
               TRUE ~ as.character(Status)))

If you want to avoid indexing by hand, put all the data in the same dataframe:

df_all <- dplyr::left_join(df, df.test, by = "Group")

Then run the code using the variables "MinTime" and "MaxTime" within the same table, instead of calling from another dataframe:

df_all %>%
    mutate(Status = case_when(
               DateTime >= MinTime &
               DateTime <= MaxTime ~ "peak",
               TRUE ~ as.character(Status)))

Replacing values in a column between two given times

Answers (2)

Related Questions