Reputation: 117
I want to replace values in a specific column that are between two times. I know the min and max times for the values and want to replace all data points between these two times with a specific label.
I have a large dataset with many groups of data, so I will try to make a simple example here. Say that I wanted to replace "almost peak" with "peak" and I know the min/max times when these labels occurred.
points <- c(1,2,3,3,4,3,2,1,11,12,13,14,13,13,12,11)
Status <- c("base", "base", "almost peak", "almost peak", "peak", "almost peak", "base", "base", "base", "base", "almost peak", "peak", "almost peak", "almost peak", "base", "base")
DateTime <- seq(from = as.POSIXct("2021-10-16 11:37:23"), to = as.POSIXct("2021-10-16 11:37:38"), by = "sec")
Group <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)
df <- data.frame(points, Status, DateTime, Group)
#for getting the min and max times of "almost peak" occurrences
df.test <- df %>% group_by(Group) %>%
filter(Status == "almost peak") %>%
summarise(
MinTime = min(DateTime),
MaxTime = max(DateTime)
)
>print(df)
points Status DateTime Group
1 1 base 2021-10-16 11:37:23 1
2 2 base 2021-10-16 11:37:24 1
3 3 almost peak 2021-10-16 11:37:25 1
4 3 almost peak 2021-10-16 11:37:26 1
5 4 peak 2021-10-16 11:37:27 1
6 3 almost peak 2021-10-16 11:37:28 1
7 2 base 2021-10-16 11:37:29 1
8 1 base 2021-10-16 11:37:30 1
9 11 base 2021-10-16 11:37:31 2
10 12 base 2021-10-16 11:37:32 2
11 13 almost peak 2021-10-16 11:37:33 2
12 14 peak 2021-10-16 11:37:34 2
13 13 almost peak 2021-10-16 11:37:35 2
14 13 almost peak 2021-10-16 11:37:36 2
15 12 base 2021-10-16 11:37:37 2
16 11 base 2021-10-16 11:37:38 2
Again, I want to replace all data points that are between MinTime
and MaxTime
with "peak" for each Group.
I have tried using mutate()
with replace()
as shown below, but it doesn't seem to be working.
I think this is close, but it's not quite right.
df.test.replace <- df %>%
group_by(Group) %>%
mutate(Status = replace(Status, DateTime >= df.test$MinTime & DateTime <= df.test$MaxTime, "peak"))
As a clarification, this is my desired output. Where all the status labels between the min/max time have been replaced with "peak"
points Status DateTime Group
1 1 base 2021-10-16 11:37:23 1
2 2 base 2021-10-16 11:37:24 1
3 3 peak 2021-10-16 11:37:25 1
4 3 peak 2021-10-16 11:37:26 1
5 4 peak 2021-10-16 11:37:27 1
6 3 peak 2021-10-16 11:37:28 1
7 2 base 2021-10-16 11:37:29 1
8 1 base 2021-10-16 11:37:30 1
9 11 base 2021-10-16 11:37:31 2
10 12 base 2021-10-16 11:37:32 2
11 13 peak 2021-10-16 11:37:33 2
12 14 peak 2021-10-16 11:37:34 2
13 13 peak 2021-10-16 11:37:35 2
14 13 peak 2021-10-16 11:37:36 2
15 12 base 2021-10-16 11:37:37 2
16 11 base 2021-10-16 11:37:38 2
Any pointers would be appreciated. Thank you.
Upvotes: 2
Views: 72
Reputation: 23757
No need for case_when
df %>%
group_by(Group) %>%
mutate(
Status = ifelse(Status == "almost peak" & DateTime < max(DateTime) & DateTime > min(DateTime), "peak", Status)
)
#> # A tibble: 16 × 4
#> # Groups: Group [2]
#> points Status DateTime Group
#> <dbl> <chr> <dttm> <dbl>
#> 1 1 base 2021-10-16 11:37:23 1
#> 2 2 base 2021-10-16 11:37:24 1
#> 3 3 peak 2021-10-16 11:37:25 1
#> 4 3 peak 2021-10-16 11:37:26 1
#> 5 4 peak 2021-10-16 11:37:27 1
#> 6 3 peak 2021-10-16 11:37:28 1
#> 7 2 base 2021-10-16 11:37:29 1
#> 8 1 base 2021-10-16 11:37:30 1
#> 9 11 base 2021-10-16 11:37:31 2
#> 10 12 base 2021-10-16 11:37:32 2
#> 11 13 peak 2021-10-16 11:37:33 2
#> 12 14 peak 2021-10-16 11:37:34 2
#> 13 13 peak 2021-10-16 11:37:35 2
#> 14 13 peak 2021-10-16 11:37:36 2
#> 15 12 base 2021-10-16 11:37:37 2
#> 16 11 base 2021-10-16 11:37:38 2
Upvotes: 1
Reputation: 421
You need to index the right values to make the replacement. Try using case_when
from dplyr:
library(dplyr)
df %>%
group_by(Group) %>%
mutate(Status = case_when(
DateTime >= df.test$MinTime[1] &
DateTime <= df.test$MaxTime[1] ~ "peak",
DateTime >= df.test$MinTime[2] &
DateTime <= df.test$MaxTime[2] ~ "peak",
TRUE ~ as.character(Status)))
If you want to avoid indexing by hand, put all the data in the same dataframe:
df_all <- dplyr::left_join(df, df.test, by = "Group")
Then run the code using the variables "MinTime" and "MaxTime" within the same table, instead of calling from another dataframe:
df_all %>%
mutate(Status = case_when(
DateTime >= MinTime &
DateTime <= MaxTime ~ "peak",
TRUE ~ as.character(Status)))
Upvotes: 2