Reputation: 79
I'm having issues to figuring out the code for the following: We have our basic table named as shown below
Status
is calculate if qty_ordered == qty_delivered
then Status = "D"
else Status == "N"
.
For Flag
is where it gets more complicated:
Flag == "Yes"
if Status== "D"
, unless there's 2 "N"
s on 2 previous consecutive days as you can see on May 6th for apples.
Flag == "N"
if it's at least 2 consecutive days of N, as shown for apple in May4rth.
Item | Date | qty_ordered | qty_delivered | Status | Flag |
---|---|---|---|---|---|
Apple | 1-May | 100 | 100 | D | YES |
Apple | 2-May | 100 | 100 | D | YES |
Apple | 3-May | 100 | 70 | N | YES |
Apple | 4-May | 100 | 0 | N | NO |
Apple | 5-May | 100 | 0 | N | NO |
Apple | 6-May | 100 | 100 | D | NO |
Apple | 7-May | 100 | 100 | D | YES |
Banana | 1-May | 50 | 50 | D | YES |
Banana | 2-May | 50 | 0 | N | YES |
Banana | 3-May | 50 | 50 | D | YES |
Banana | 4-May | 50 | 50 | D | YES |
Banana | 5-May | 50 | 50 | D | YES |
I usually do mutate
to calculate new fields such as:
df <- mutate(df,Flag= if_else(qty_ordered == qty_delivered, "YES","NO"))
but this doesn't includes the validation if the previous days that the problem needs.
Any help would be appreciated.
Upvotes: 0
Views: 3360
Reputation: 4841
You can do this in base R with
# create a data.frame with only the relevant columns
dat <- data.frame(Item = c(rep("Apple", 7), rep("Banana", 5)),
Status = c("D", "D", "N", "N", "N", "D", "D",
"D", "N", "D", "D", "D"))
# create the flag column
transform(dat, Flag = ave(Status == "N", Item, FUN = function(is_N)
ifelse(c(F, head(is_N, -1)) & (c(F, F, head(is_N, -2)) | is_N), "NO", "YES")))
#R> Item Status Flag
#R> 1 Apple D YES
#R> 2 Apple D YES
#R> 3 Apple N YES
#R> 4 Apple N NO
#R> 5 Apple N NO
#R> 6 Apple D NO
#R> 7 Apple D YES
#R> 8 Banana D YES
#R> 9 Banana N YES
#R> 10 Banana D YES
#R> 11 Banana D YES
#R> 12 Banana D YES
A faster variant is
transform(dat, Flag = ave(Status == "N", Item, FUN = function(is_N)
c("YES", "NO")[
1L + (c(F, head(is_N, -1)) & (c(F, F, head(is_N, -2)) | is_N))]))
and here is a small simulation study
# perform a benchmark study with simulated data
library(dplyr)
set.seed(1)
n_lvls <- 1000L
n_per_lvl <- 6L
dat <- data.frame(
Item = as.character(gl(n = n_lvls, n_per_lvl)),
Status = sample(c("D", "N"), replace = TRUE, n_per_lvl * n_lvls))
bench::mark(
first = transform(dat, Flag = ave(Status == "N", Item, FUN = function(is_N)
ifelse(c(F, head(is_N, -1)) & (c(F, F, head(is_N, -2)) | is_N), "NO", "YES"))),
faster = transform(dat, Flag = ave(Status == "N", Item, FUN = function(is_N)
c("YES", "NO")[
1L + (c(F, head(is_N, -1)) & (c(F, F, head(is_N, -2)) | is_N))])),
dplyr = dat %>% group_by(Item) %>%
mutate(flag = case_when(lag(Status) == 'N' & lag(Status, 2) == 'N' ~ 'NO',
Status == 'D' | lag(Status) == 'D' ~ 'YES',
TRUE ~ 'NO')), check = FALSE)
#R> # A tibble: 3 x 13
#R> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
#R> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
#R> 1 first 16.4ms 17.1ms 53.6 511KB 15.9 27 8 504ms <NULL> <Rprofmem [34 × 3]> <bench_tm [27… <tibble [27 × 3…
#R> 2 faster 12.1ms 14ms 58.4 511KB 15.6 30 8 514ms <NULL> <Rprofmem [34 × 3]> <bench_tm [30… <tibble [30 × 3…
#R> 3 dplyr 196ms 196.7ms 5.03 333KB 15.1 3 9 596ms <NULL> <Rprofmem [45 × 3]> <bench_tm [3]> <tibble [3 × 3]>
where dplyr
is 196 / 12.1 ~ 16 times slower.
Upvotes: 0
Reputation: 11584
Does this work:
library(dplyr)
df %>% group_by(Item) %>% mutate(Flag = case_when(Status == 'N' & lag(Status == 'N') ~ 'NO', TRUE ~ 'YES'))
# A tibble: 12 x 6
# Groups: Item [2]
Item Date qty_ordered qty_delivered Status Flag
<chr> <chr> <int> <int> <chr> <chr>
1 Apple 1-May 100 100 D YES
2 Apple 2-May 100 100 D YES
3 Apple 3-May 100 70 N YES
4 Apple 4-May 100 0 N NO
5 Apple 5-May 100 0 N NO
6 Apple 6-May 100 100 D YES
7 Apple 7-May 100 100 D YES
8 Banana 1-May 50 50 D YES
9 Banana 2-May 50 0 N YES
10 Banana 3-May 50 50 D YES
11 Banana 4-May 50 50 D YES
12 Banana 5-May 50 50 D YES
Upvotes: 0
Reputation: 388807
You can use lag
to refer to previous values. Try -
library(dplyr)
df %>%
mutate(flag = case_when(lag(Status) == 'N' & lag(Status, 2) == 'N' ~ 'NO',
Status == 'D' | lag(Status) == 'D' ~ 'YES',
TRUE ~ 'NO'))
# Item Date qty_ordered qty_delivered Status Flag
#1 Apple 1-May 100 100 D YES
#2 Apple 2-May 100 100 D YES
#3 Apple 3-May 100 70 N YES
#4 Apple 4-May 100 0 N NO
#5 Apple 5-May 100 0 N NO
#6 Apple 6-May 100 100 D NO
#7 Apple 7-May 100 100 D YES
#8 Banana 1-May 50 50 D YES
#9 Banana 2-May 50 0 N YES
#10 Banana 3-May 50 50 D YES
#11 Banana 4-May 50 50 D YES
#12 Banana 5-May 50 50 D YES
You may want to add group_by(Item)
to do this separately for each Item
.
Upvotes: 4