raghav
raghav

Reputation: 533

Change values from a column for two (or N) consecutive days if condition met in R

I have following dataframe:

df <- structure(list(DateTime = structure(c(1477978200, 1477980000, 1477981800, 1477983600, 1477985400, 1477987200, 1477989000, 1477990800, 1477992600, 1477994400, 1477996200, 1477998000, 1477999800, 1478001600, 1478003400, 1478005200, 1478007000, 1478008800, 1478010600, 1478012400, 1478014200, 1478016000, 1478017800, 1478019600, 1478021400, 1478023200, 1478025000, 1478026800, 1478028600, 1478030400, 1478032200, 1478034000, 1478035800, 1478037600, 1478039400, 1478041200, 1478043000, 1478044800, 1478046600, 1478048400, 1478050200, 1478052000, 1478053800, 1478055600, 1478057400, 1478059200, 1478061000, 1478062800, 1478064600, 1478066400, 1478068200, 1478070000, 1478071800, 1478073600, 1478075400, 1478077200, 1478079000, 1478080800, 1478082600, 1478084400, 1478086200, 1478088000, 1478089800, 1478091600, 1478093400, 1478095200, 1478097000, 1478098800, 1478100600, 1478102400, 1478104200, 1478106000, 1478107800, 1478109600, 1478111400, 1478113200, 1478115000, 1478116800, 1478118600, 1478120400), class = c("POSIXct", "POSIXt"), tzone = "America/Chicago"), Date = structure(c(17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107), class = "Date"), Rain_daily = c(8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), PET_daily = c(7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), Mask = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE)), row.names = c(NA, 80L), class = "data.frame")

I want to change values in column "Mask" to "FALSE" for two consecutive days if "Rain_daily" is greater than "PET_daily" for a particular day. For example, for the given dataframe, values in Mask column will be replaced with FALSE for both the days "2016-11-01" and "2016-11-02" (as 8>7).

How can we do this? I want to avoid for looping.

Upvotes: 0

Views: 124

Answers (2)

Uwe
Uwe

Reputation: 42544

In a comment, the OP has mentioned My dataframe is really big having multiple columns.

Therefore, here is a approach which assigns by reference only the Mask column. This avoids to copy the whole dataset.

library(data.table)
setDT(df2)
df2[, Mask := !Date %in% CJ(.SD[Rain_daily > PET_daily, Date], 0:1, unique = TRUE)[, V1 + V2]]
df2
               DateTime       Date Rain_daily PET_daily  Mask
 1: 2015-01-01 00:00:00 2015-01-01          1         2  TRUE
 2: 2015-01-01 00:30:00 2015-01-01          1         2  TRUE
 3: 2015-01-01 01:00:00 2015-01-01          1         2  TRUE
 4: 2015-01-02 00:00:00 2015-01-02          3         2 FALSE
 5: 2015-01-02 00:30:00 2015-01-02          3         2 FALSE
 6: 2015-01-02 01:00:00 2015-01-02          3         2 FALSE
 7: 2015-01-03 00:00:00 2015-01-03          1         2 FALSE
 8: 2015-01-03 00:30:00 2015-01-03          1         2 FALSE
 9: 2015-01-03 01:00:00 2015-01-03          1         2 FALSE
10: 2015-01-04 00:00:00 2015-01-04          1         2  TRUE
11: 2015-01-04 00:30:00 2015-01-04          1         2  TRUE
12: 2015-01-04 01:00:00 2015-01-04          1         2  TRUE
13: 2015-01-05 00:00:00 2015-01-05          3         2 FALSE
14: 2015-01-05 00:30:00 2015-01-05          3         2 FALSE
15: 2015-01-05 01:00:00 2015-01-05          3         2 FALSE
16: 2015-01-06 00:00:00 2015-01-06          4         2 FALSE
17: 2015-01-06 00:30:00 2015-01-06          4         2 FALSE
18: 2015-01-06 01:00:00 2015-01-06          4         2 FALSE
19: 2015-01-07 00:00:00 2015-01-07          1         2 FALSE
20: 2015-01-07 00:30:00 2015-01-07          1         2 FALSE
21: 2015-01-07 01:00:00 2015-01-07          1         2 FALSE
22: 2015-01-08 00:00:00 2015-01-08          1         2  TRUE
23: 2015-01-08 00:30:00 2015-01-08          1         2  TRUE
24: 2015-01-08 01:00:00 2015-01-08          1         2  TRUE
               DateTime       Date Rain_daily PET_daily  Mask

Note that a new sample dataset has been used here which includes different days to demonstrate different use cases, see section below.

New sample dataset

As a reproducible example only three timestamps per day but at 8 different days are simulated.

library(data.table)
df2 <- fread("
      Date Rain_daily PET_daily
2015-01-01          1         2
2015-01-02          3         2
2015-01-03          1         2
2015-01-04          1         2
2015-01-05          3         2
2015-01-06          4         2
2015-01-07          1         2
2015-01-08          1         2
")[CJ(Date, DateTime = as.ITime(60 * 30 * 0:2))[
  , DateTime := as.POSIXct(Date) + DateTime], on = "Date"][
    , setcolorder(.SD, "DateTime")]
df2
               DateTime       Date Rain_daily PET_daily
 1: 2015-01-01 00:00:00 2015-01-01          1         2
 2: 2015-01-01 00:30:00 2015-01-01          1         2
 3: 2015-01-01 01:00:00 2015-01-01          1         2
 4: 2015-01-02 00:00:00 2015-01-02          3         2
 5: 2015-01-02 00:30:00 2015-01-02          3         2
 6: 2015-01-02 01:00:00 2015-01-02          3         2
 7: 2015-01-03 00:00:00 2015-01-03          1         2
 8: 2015-01-03 00:30:00 2015-01-03          1         2
 9: 2015-01-03 01:00:00 2015-01-03          1         2
10: 2015-01-04 00:00:00 2015-01-04          1         2
11: 2015-01-04 00:30:00 2015-01-04          1         2
12: 2015-01-04 01:00:00 2015-01-04          1         2
13: 2015-01-05 00:00:00 2015-01-05          3         2
14: 2015-01-05 00:30:00 2015-01-05          3         2
15: 2015-01-05 01:00:00 2015-01-05          3         2
16: 2015-01-06 00:00:00 2015-01-06          4         2
17: 2015-01-06 00:30:00 2015-01-06          4         2
18: 2015-01-06 01:00:00 2015-01-06          4         2
19: 2015-01-07 00:00:00 2015-01-07          1         2
20: 2015-01-07 00:30:00 2015-01-07          1         2
21: 2015-01-07 01:00:00 2015-01-07          1         2
22: 2015-01-08 00:00:00 2015-01-08          1         2
23: 2015-01-08 00:30:00 2015-01-08          1         2
24: 2015-01-08 01:00:00 2015-01-08          1         2
               DateTime       Date Rain_daily PET_daily

Upvotes: 1

dy_by
dy_by

Reputation: 1241

using case_when from dplyr.

if date meets the condition Rain_daily > PET_daily, chenge Mask for these (df[Rain_daily > PET_daily, Date]) and next (df[Rain_daily > PET_daily, Date]+1) days. elsewhere leave Mask as it is (TRUE ~ Mask)

library(data.table)
library(dplyr)

df <- df %>% as.data.table() # to simlify operations on df
df <- df %>% 
  mutate(Mask = case_when(Date %in% c(df[Rain_daily > PET_daily, Date],df[Rain_daily > PET_daily, Date]+1) ~ FALSE,
                          TRUE ~Mask ))

Hope that helps

Upvotes: 1

Related Questions