jontieez
jontieez

Reputation: 79

New column value based on column and previous row values in R

I'm having issues to figuring out the code for the following: We have our basic table named as shown below

Status is calculate if qty_ordered == qty_delivered then Status = "D" else Status == "N".

For Flag is where it gets more complicated: Flag == "Yes" if Status== "D", unless there's 2 "N"s on 2 previous consecutive days as you can see on May 6th for apples.

Flag == "N" if it's at least 2 consecutive days of N, as shown for apple in May4rth.

Item Date qty_ordered qty_delivered Status Flag
Apple 1-May 100 100 D YES
Apple 2-May 100 100 D YES
Apple 3-May 100 70 N YES
Apple 4-May 100 0 N NO
Apple 5-May 100 0 N NO
Apple 6-May 100 100 D NO
Apple 7-May 100 100 D YES
Banana 1-May 50 50 D YES
Banana 2-May 50 0 N YES
Banana 3-May 50 50 D YES
Banana 4-May 50 50 D YES
Banana 5-May 50 50 D YES

I usually do mutate to calculate new fields such as: df <- mutate(df,Flag= if_else(qty_ordered == qty_delivered, "YES","NO"))

but this doesn't includes the validation if the previous days that the problem needs.

Any help would be appreciated.

Upvotes: 0

Views: 3360

Answers (3)

Benjamin Christoffersen
Benjamin Christoffersen

Reputation: 4841

You can do this in base R with

# create a data.frame with only the relevant columns
dat <- data.frame(Item = c(rep("Apple", 7), rep("Banana", 5)), 
                  Status = c("D", "D", "N", "N", "N", "D", "D", 
                             "D", "N", "D", "D", "D"))

# create the flag column
transform(dat, Flag = ave(Status == "N", Item, FUN = function(is_N)
  ifelse(c(F, head(is_N, -1)) & (c(F, F, head(is_N, -2)) | is_N), "NO", "YES")))
#R>      Item Status Flag
#R> 1   Apple      D  YES
#R> 2   Apple      D  YES
#R> 3   Apple      N  YES
#R> 4   Apple      N   NO
#R> 5   Apple      N   NO
#R> 6   Apple      D   NO
#R> 7   Apple      D  YES
#R> 8  Banana      D  YES
#R> 9  Banana      N  YES
#R> 10 Banana      D  YES
#R> 11 Banana      D  YES
#R> 12 Banana      D  YES

A faster variant is

transform(dat, Flag = ave(Status == "N", Item, FUN = function(is_N)
  c("YES", "NO")[
    1L + (c(F, head(is_N, -1)) & (c(F, F, head(is_N, -2)) | is_N))]))

and here is a small simulation study

# perform a benchmark study with simulated data
library(dplyr)
set.seed(1)
n_lvls <- 1000L
n_per_lvl <- 6L
dat <- data.frame(
  Item = as.character(gl(n = n_lvls, n_per_lvl)), 
  Status = sample(c("D", "N"), replace = TRUE, n_per_lvl * n_lvls))

bench::mark(
  first = transform(dat, Flag = ave(Status == "N", Item, FUN = function(is_N)
    ifelse(c(F, head(is_N, -1)) & (c(F, F, head(is_N, -2)) | is_N), "NO", "YES"))),
  faster = transform(dat, Flag = ave(Status == "N", Item, FUN = function(is_N)
    c("YES", "NO")[
      1L + (c(F, head(is_N, -1)) & (c(F, F, head(is_N, -2)) | is_N))])), 
  dplyr = dat %>% group_by(Item) %>%
    mutate(flag = case_when(lag(Status) == 'N' & lag(Status, 2) == 'N' ~ 'NO',
                            Status == 'D' | lag(Status) == 'D' ~ 'YES', 
                            TRUE ~ 'NO')), check = FALSE)
#R> # A tibble: 3 x 13
#R>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory              time           gc              
#R>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>              <list>         <list>          
#R> 1 first        16.4ms   17.1ms     53.6      511KB     15.9    27     8      504ms <NULL> <Rprofmem [34 × 3]> <bench_tm [27… <tibble [27 × 3…
#R> 2 faster       12.1ms     14ms     58.4      511KB     15.6    30     8      514ms <NULL> <Rprofmem [34 × 3]> <bench_tm [30… <tibble [30 × 3…
#R> 3 dplyr         196ms  196.7ms      5.03     333KB     15.1     3     9      596ms <NULL> <Rprofmem [45 × 3]> <bench_tm [3]> <tibble [3 × 3]>

where dplyr is 196 / 12.1 ~ 16 times slower.

Upvotes: 0

Karthik S
Karthik S

Reputation: 11584

Does this work:

library(dplyr)
df %>% group_by(Item) %>% mutate(Flag = case_when(Status == 'N' & lag(Status == 'N') ~ 'NO', TRUE ~ 'YES'))
# A tibble: 12 x 6
# Groups:   Item [2]
   Item   Date  qty_ordered qty_delivered Status Flag 
   <chr>  <chr>       <int>         <int> <chr>  <chr>
 1 Apple  1-May         100           100 D      YES  
 2 Apple  2-May         100           100 D      YES  
 3 Apple  3-May         100            70 N      YES  
 4 Apple  4-May         100             0 N      NO   
 5 Apple  5-May         100             0 N      NO   
 6 Apple  6-May         100           100 D      YES  
 7 Apple  7-May         100           100 D      YES  
 8 Banana 1-May          50            50 D      YES  
 9 Banana 2-May          50             0 N      YES  
10 Banana 3-May          50            50 D      YES  
11 Banana 4-May          50            50 D      YES  
12 Banana 5-May          50            50 D      YES  

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388807

You can use lag to refer to previous values. Try -

library(dplyr)

df %>%
  mutate(flag = case_when(lag(Status) == 'N' & lag(Status, 2) == 'N' ~ 'NO',
                          Status == 'D' | lag(Status) == 'D' ~ 'YES', 
                          TRUE ~ 'NO'))

#     Item  Date qty_ordered qty_delivered Status Flag
#1   Apple 1-May         100           100      D  YES
#2   Apple 2-May         100           100      D  YES
#3   Apple 3-May         100            70      N  YES
#4   Apple 4-May         100             0      N   NO
#5   Apple 5-May         100             0      N   NO
#6   Apple 6-May         100           100      D   NO
#7   Apple 7-May         100           100      D  YES
#8  Banana 1-May          50            50      D  YES
#9  Banana 2-May          50             0      N  YES
#10 Banana 3-May          50            50      D  YES
#11 Banana 4-May          50            50      D  YES
#12 Banana 5-May          50            50      D  YES 

You may want to add group_by(Item) to do this separately for each Item.

Upvotes: 4

Related Questions