new2data
new2data

Reputation: 117

Replacing values in a column between two given times

I want to replace values in a specific column that are between two times. I know the min and max times for the values and want to replace all data points between these two times with a specific label.

I have a large dataset with many groups of data, so I will try to make a simple example here. Say that I wanted to replace "almost peak" with "peak" and I know the min/max times when these labels occurred.

points <- c(1,2,3,3,4,3,2,1,11,12,13,14,13,13,12,11)

Status <- c("base", "base", "almost peak", "almost peak", "peak", "almost peak", "base", "base", "base", "base", "almost peak", "peak", "almost peak", "almost peak", "base", "base")

DateTime <- seq(from = as.POSIXct("2021-10-16 11:37:23"), to = as.POSIXct("2021-10-16 11:37:38"), by = "sec")

Group <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)

df <- data.frame(points, Status, DateTime, Group)

#for getting the min and max times of "almost peak" occurrences 
df.test <- df %>% group_by(Group) %>% 
  filter(Status == "almost peak") %>%
  summarise(
    MinTime = min(DateTime),
    MaxTime = max(DateTime)
  )
>print(df)
      points   Status       DateTime      Group
1       1        base 2021-10-16 11:37:23     1
2       2        base 2021-10-16 11:37:24     1
3       3 almost peak 2021-10-16 11:37:25     1
4       3 almost peak 2021-10-16 11:37:26     1
5       4        peak 2021-10-16 11:37:27     1
6       3 almost peak 2021-10-16 11:37:28     1
7       2        base 2021-10-16 11:37:29     1
8       1        base 2021-10-16 11:37:30     1
9      11        base 2021-10-16 11:37:31     2
10     12        base 2021-10-16 11:37:32     2
11     13 almost peak 2021-10-16 11:37:33     2
12     14        peak 2021-10-16 11:37:34     2
13     13 almost peak 2021-10-16 11:37:35     2
14     13 almost peak 2021-10-16 11:37:36     2
15     12        base 2021-10-16 11:37:37     2
16     11        base 2021-10-16 11:37:38     2

Again, I want to replace all data points that are between MinTime and MaxTime with "peak" for each Group. I have tried using mutate() with replace() as shown below, but it doesn't seem to be working. I think this is close, but it's not quite right.

df.test.replace <- df %>%
  group_by(Group) %>%
  mutate(Status = replace(Status, DateTime >= df.test$MinTime & DateTime <= df.test$MaxTime, "peak"))

As a clarification, this is my desired output. Where all the status labels between the min/max time have been replaced with "peak"

      points   Status       DateTime      Group
1       1        base 2021-10-16 11:37:23     1
2       2        base 2021-10-16 11:37:24     1
3       3        peak 2021-10-16 11:37:25     1
4       3        peak 2021-10-16 11:37:26     1
5       4        peak 2021-10-16 11:37:27     1
6       3        peak 2021-10-16 11:37:28     1
7       2        base 2021-10-16 11:37:29     1
8       1        base 2021-10-16 11:37:30     1
9      11        base 2021-10-16 11:37:31     2
10     12        base 2021-10-16 11:37:32     2
11     13        peak 2021-10-16 11:37:33     2
12     14        peak 2021-10-16 11:37:34     2
13     13        peak 2021-10-16 11:37:35     2
14     13        peak 2021-10-16 11:37:36     2
15     12        base 2021-10-16 11:37:37     2
16     11        base 2021-10-16 11:37:38     2

Any pointers would be appreciated. Thank you.

Upvotes: 2

Views: 72

Answers (2)

tjebo
tjebo

Reputation: 23757

No need for case_when

df  %>% 
  group_by(Group) %>% 
  mutate(
    Status = ifelse(Status == "almost peak" & DateTime < max(DateTime) & DateTime > min(DateTime), "peak", Status)
  )
#> # A tibble: 16 × 4
#> # Groups:   Group [2]
#>    points Status DateTime            Group
#>     <dbl> <chr>  <dttm>              <dbl>
#>  1      1 base   2021-10-16 11:37:23     1
#>  2      2 base   2021-10-16 11:37:24     1
#>  3      3 peak   2021-10-16 11:37:25     1
#>  4      3 peak   2021-10-16 11:37:26     1
#>  5      4 peak   2021-10-16 11:37:27     1
#>  6      3 peak   2021-10-16 11:37:28     1
#>  7      2 base   2021-10-16 11:37:29     1
#>  8      1 base   2021-10-16 11:37:30     1
#>  9     11 base   2021-10-16 11:37:31     2
#> 10     12 base   2021-10-16 11:37:32     2
#> 11     13 peak   2021-10-16 11:37:33     2
#> 12     14 peak   2021-10-16 11:37:34     2
#> 13     13 peak   2021-10-16 11:37:35     2
#> 14     13 peak   2021-10-16 11:37:36     2
#> 15     12 base   2021-10-16 11:37:37     2
#> 16     11 base   2021-10-16 11:37:38     2

Upvotes: 1

Jose
Jose

Reputation: 421

You need to index the right values to make the replacement. Try using case_when from dplyr:

library(dplyr)
df %>%
    group_by(Group) %>%
    mutate(Status = case_when(
               DateTime >= df.test$MinTime[1] &
               DateTime <= df.test$MaxTime[1] ~ "peak",
               DateTime >= df.test$MinTime[2] &
               DateTime <= df.test$MaxTime[2] ~ "peak",
               TRUE ~ as.character(Status)))

If you want to avoid indexing by hand, put all the data in the same dataframe:

df_all <- dplyr::left_join(df, df.test, by = "Group")

Then run the code using the variables "MinTime" and "MaxTime" within the same table, instead of calling from another dataframe:

df_all %>%
    mutate(Status = case_when(
               DateTime >= MinTime &
               DateTime <= MaxTime ~ "peak",
               TRUE ~ as.character(Status)))

Upvotes: 2

Related Questions