J34
J34

Reputation: 19

Changing values based on a condition happening in the past 48 hours?

I have a dataset where I need to change any values that are 0, but have had a non-zero value in the past 48 hours to a specific string. I'm guessing I might need to convert all but the first column from dbl to chr before doing this?

   Time                 colA             colB             colC  colD
   <dttm>              <dbl>            <dbl>            <dbl> <dbl>
 1 2021-11-21 10:00:00     8                0                9   176
 2 2021-11-11 11:00:00    21                0               22   416
 3 2021-11-21 11:00:00    19                0               20   373
 4 2021-11-11 12:00:00    40               13               28   566
 5 2021-11-21 12:00:00    26                0               27   527
 6 2021-11-11 13:00:00    50               20               32   651
 7 2021-11-11 10:00:00    11                0               12   216
 8 2021-11-21 13:00:00    30                0               31   617
 9 2021-11-11 14:00:00    51                0               32   675
10 2021-11-21 14:00:00    31                0               32   644

Apologies that the data isn't sorted by time already, working on fixing that. For example, here's the output that I'd like to get to:

   Time                 colA             colB             colC  colD
   <dttm>              <dbl>            <dbl>            <dbl> <dbl>
 1 2021-11-21 10:00:00     8                0                9   176
 2 2021-11-11 11:00:00    21                0               22   416
 3 2021-11-21 11:00:00    19                0               20   373
 4 2021-11-11 12:00:00    40               13               28   566
 5 2021-11-21 12:00:00    26                0               27   527
 6 2021-11-11 13:00:00    50               20               32   651
 7 2021-11-11 10:00:00    11                0               12   216
 8 2021-11-21 13:00:00    30                0               31   617
 9 2021-11-11 14:00:00    51          STRING1               32   675
10 2021-11-21 14:00:00    31                0               32   644

Since colB has a value of 0 at 2021-11-11 14:00:00, but atleast 1 of the previous values in the 48 hours before that != 0, it gets changed to "STRING1"

Apologies if this is confusing, I'm trying to automate something I typically do manually in Excel. Thanks in advance

Upvotes: 1

Views: 35

Answers (1)

kybazzi
kybazzi

Reputation: 1030

Here's a tidyverse solution. I'll first create some sample data (note that I do indeed set up the other columns as characters):

data = tribble(
  ~ time, ~ colA, ~ colB,
  "2021-11-21 12:00:00", 1, 0,
  "2021-11-22 00:00:00", 0, 1,
  "2021-11-24 12:00:00", 0, 0,
  "2021-11-25 12:00:00", 1, 1,
  "2021-11-26 12:00:00", 0, 0,
) %>%
  mutate(
    time = ymd_hms(time), 
    across(-time, as.character)
  )

# A tibble: 5 x 3
  time                colA  colB 
  <dttm>              <chr> <chr>
1 2021-11-21 12:00:00 1     0    
2 2021-11-22 00:00:00 0     1    
3 2021-11-24 12:00:00 0     0    
4 2021-11-25 12:00:00 1     1    
5 2021-11-26 12:00:00 0     0    

The challenge of this problem is, for each time, we need to know which other rows to look for to determine the new value of each column. To do this, I will use purrr:pmap() and the .data object available in dplyr. I'll start by demonstrating how you could "look back" at rows with a time within the last 48 hours:

data %>%
  mutate(
    across(
      .cols = -time, 
      function(col) {
        pmap_chr(list(time), function(t) {
          eligible = .data$time >= t - hours(48) & .data$time < t
          paste(col[eligible], collapse = ",")
        })
      },
      .names = "{.col}_previous"
    )
  )

Output:

# A tibble: 5 x 5
  time                colA  colB  colA_previous colB_previous
  <dttm>              <chr> <chr> <chr>         <chr>        
1 2021-11-21 12:00:00 1     0     ""            ""           
2 2021-11-22 00:00:00 0     1     "1"           "0"          
3 2021-11-24 12:00:00 0     0     ""            ""           
4 2021-11-25 12:00:00 1     1     "0"           "0"          
5 2021-11-26 12:00:00 0     0     "0,1"         "0,1"    

As you can see, this looks promising. We are ready to develop the solution using this as the key idea.

data %>%
  mutate(
    across(
      .cols = -time, 
      function(col) {
        modify_ind = pmap_lgl(list(time), function(t) {
          eligible = .data$time >= t - hours(48) & .data$time < t
          any(col[eligible] != "0")
        })
        ifelse(modify_ind & col == "0", "STRING1", col)
      }
    )
  )

Output:

# A tibble: 5 x 3
  time                colA    colB   
  <dttm>              <chr>   <chr>  
1 2021-11-21 12:00:00 1       0      
2 2021-11-22 00:00:00 STRING1 1      
3 2021-11-24 12:00:00 0       0      
4 2021-11-25 12:00:00 1       1      
5 2021-11-26 12:00:00 STRING1 STRING1

Some notes:

  • In my solution, I create a modify_ind that builds on the idea shown earlier: access the column filtered on times within the last 48 hours, and then check if there were any nonzero values. I then modify the column whenever its modify_ind is TRUE and the value of the column is "0".
  • If you wanted to create new columns instead of completely replacing the originals, add a .names parameter to across() as shown earlier.
  • If you only want to modify a couple of columns in your dataset, instead of using across(.cols = -time), use something like across(.cols = c("colA", "colB")).

Upvotes: 1

Related Questions