Reputation: 3195
I have dataset
mydat <-
structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("52382MCK",
"52499MCK"), class = "factor"), item = c(11709L, 11709L, 11709L,
11709L, 11708L, 11708L, 11708L, 11710L, 11710L, 11710L, 11710L,
11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L,
11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L,
11710L, 11202L, 11203L, 11203L, 11204L, 11204L, 11205L, 11205L
), sales = c(30L, 10L, 20L, 15L, 2L, 10L, 3L, 30L, 10L, 20L,
15L, 2L, 10L, 3L, 30L, 10L, 20L, 15L, 2L, 10L, 3L, 30L, 10L,
20L, 15L, 2L, 10L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), action = c(0L,
1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
1L, 1L)), row.names = c(NA, -35L), class = "data.frame")
# coerce to data.table
setDT(mydat)
with this dataset, several operations are performed.
1. selecting scenario by groups.
So there is action column. It can have only two values zero(0) or one(1).
The scenarios are the number of zero categories of action before first category of action and the number of zeros after one category of action.
For example
52382МСК 11709
it is scenario when we have 1 zero category of action col. before first category of action col , and two zeros after first category of action col. Note: maybe scenario when we have 2 zero category of action col. before first category of action col , and 1 zero after first category of action col.
mydat1
code item sales action
52382МСК 11709 30 0
52382МСК 11709 10 1
52382МСК 11709 20 0
52382МСК 11709 15 0
to detect this scenario i use this script/ This script very well works, thank for @Uwe
library(data.table)
library(magrittr)
max_zeros <- 3
zeros <- sapply(0:max_zeros, stringr::str_dup, string = "0")
names(zeros) <- as.character(nchar(zeros))
sc <- CJ(zeros.before = zeros, zeros.after = zeros)[
, scenario.name := paste(nchar(zeros.before), nchar(zeros.after), sep = "-")][
, action.pattern := sprintf("%s1+(?=%s)", zeros.before, zeros.after)][]
# special case: all zero
sc0 <- data.table(
zeros.before = NA,
zeros.after = NA,
scenario.name = "no1",
action.pattern = "^0+$")
sc <- rbind(sc0, sc)
and then
setDT(mydat)
class <- mydat[, .(scenario.name = sc$scenario.name[
paste(action, collapse = "") %>%
stringr::str_count(sc$action.pattern) %>%
is_greater_than(0) %>%
which() %>%
max()
]),
by = .(code, item)][]
class
mydat[class, on = .(code, item)]
So i get data with class of scenario.
2.operation it is replace median.
For each scenario median by zero category is calculated.
I need to calculate the median by 1 preceding zeros category by action column, i.e. which go before one category of action column, and by 2 zeros by action column that go after the one category. The median replacing performed only for first category of action column by sale column. if median is more than the sales, then do not replace it.
To do it i use the script
sales_action <- function(DF, zeros_before, zeros_after) {
library(data.table)
library(magrittr)
action_pattern <-
do.call(sprintf,
c(fmt = "%s1+(?=%s)",
stringr::str_dup("0", c(zeros_before, zeros_after)) %>% as.list()
))
message("Action pattern used: ", action_pattern)
setDT(DF)[, rn := .I]
tmp <- DF[, paste(action, collapse = "") %>%
stringr::str_locate_all(action_pattern) %>%
as.data.table() %>%
lapply(function(x) rn[x]),
by = .(code, item)][
, end := end + zeros_after]
DF[tmp, on = .(code, item, rn >= start, rn <= end),
med := as.double(median(sales[action == 0])), by = .EACHI][
, output := as.double(sales)][action == 1, output := pmin(sales, med)][
, c("rn", "med") := NULL][]
}
and then
sales_action(mydat, 1L, 2L)
so i get the result.
Each time i must manually enter the scenario to replacing by median
sales_action(mydat, 1L, 2L)
sales_action(mydat, 3L, 1L)
sales_action(mydat, 2L, 2L)
and so on.
How to do that replacing median was perform for all possible scenarios automatically so that I do not write every time sales_action(mydat, .L, .L)
So example of output
code i tem sales action output pattern
52382MCK 11709 30 0 30 01+00
52382MCK 11709 10 1 10 01+00
52382MCK 11709 20 0 20 01+00
52382MCK 11709 15 0 15 01+00
52382MCK 1170 8 0 8 01+00
52382MCK 1170 10 1 8 01+00
52382MCK 1170 2 0 2 01+00
52382MCK 1170 15 0 15 01+00
Upvotes: 0
Views: 77
Reputation: 42544
If I understand correctly, the OP wants to analyse the success of sales actions by comparing sales
figures during actions with the median sales of the periods immediately before and after the sales action.
There are some challenges:
code
, item
group.IMHO, the introduction of scenarios is a detour to handle issue 2.
The approach below
code
, item
group,output
in case the sales figure within a sales action exceeds the median of the surrounding zero action rows.The term category has been coined by the OP to distinguish between periods of sales actions (contiguous streaks of action == 1L
) and the zero action periods before and after.
library(data.table)
# coerce to data.table and create categories
setDT(mydat)[, cat := rleid(action), by = .(code, item)][]
# extract action categories, identify preceeding & succeeding zero action categories
mycat <- mydat[, .(action = first(action)), by = .(code, item, cat = cat)][
, `:=`(before = cat - 1L, after = cat + 1L)][action == 1L]
mycat
code item cat action before after
1: 52382MCK 11709 2 1 1 3
2: 52382MCK 11708 2 1 1 3
3: 52382MCK 11710 2 1 1 3
4: 52382MCK 11710 4 1 3 5
5: 52382MCK 11710 6 1 5 7
6: 52499MCK 11203 2 1 1 3
7: 52499MCK 11205 1 1 0 2
Note that group 52382MCK, 11710
includes three separate sales actions. before
and after
may point to non-existing cat
but this will be rectified automatically during the subsequent joins.
# compute median of surrouding zero action categories
action_cat_median <-
rbind(
# get sales from up to 3 zero action rows before action category
mydat[mycat, on = .(code, item, cat = before),
.(sales = tail(sales, 3), i.cat), by =.EACHI],
# get sales from up to 3 zero action rows after action category
mydat[mycat, on = .(code, item, cat = after),
.(sales = head(sales, 3), i.cat), by =.EACHI]
)[
# remove empty groups
!is.na(sales)][
# compute median for each action category
, .(med = as.double(median(sales))), by = .(code, item, cat = i.cat)]
action_cat_median
code item cat med 1: 52382MCK 11709 2 20.0 2: 52382MCK 11708 2 2.5 3: 52382MCK 11710 2 10.0 4: 52382MCK 11710 4 10.0 5: 52382MCK 11710 6 10.0 6: 52499MCK 11203 2 2.0
# prepare result
mydat[, output := as.double(sales)][
# update join
action_cat_median, on = .(code, item, cat), output := pmin(sales, med)]
Edit: Alternatively, the call to pmin()
can be replaced by a non-equi join which updates only rows where sales exceeds the median:
# prepare result, alternative approach
mydat[, output := as.double(sales)][
# non-equi update join
action_cat_median, on = .(code, item, cat, output > med), output := med]
mydat
code item sales action cat output 1: 52382MCK 11709 30 0 1 30.0 2: 52382MCK 11709 10 1 2 10.0 3: 52382MCK 11709 20 0 3 20.0 4: 52382MCK 11709 15 0 3 15.0 5: 52382MCK 11708 2 0 1 2.0 6: 52382MCK 11708 10 1 2 2.5 7: 52382MCK 11708 3 0 3 3.0 8: 52382MCK 11710 30 0 1 30.0 9: 52382MCK 11710 10 0 1 10.0 10: 52382MCK 11710 20 0 1 20.0 11: 52382MCK 11710 15 1 2 10.0 12: 52382MCK 11710 2 0 3 2.0 13: 52382MCK 11710 10 0 3 10.0 14: 52382MCK 11710 3 0 3 3.0 15: 52382MCK 11710 30 0 3 30.0 16: 52382MCK 11710 10 0 3 10.0 17: 52382MCK 11710 20 0 3 20.0 18: 52382MCK 11710 15 1 4 10.0 19: 52382MCK 11710 2 0 5 2.0 20: 52382MCK 11710 10 0 5 10.0 21: 52382MCK 11710 3 0 5 3.0 22: 52382MCK 11710 30 0 5 30.0 23: 52382MCK 11710 10 0 5 10.0 24: 52382MCK 11710 20 0 5 20.0 25: 52382MCK 11710 15 1 6 10.0 26: 52382MCK 11710 2 0 7 2.0 27: 52382MCK 11710 10 0 7 10.0 28: 52382MCK 11710 3 0 7 3.0 29: 52499MCK 11202 2 0 1 2.0 30: 52499MCK 11203 2 0 1 2.0 31: 52499MCK 11203 2 1 2 2.0 32: 52499MCK 11204 2 0 1 2.0 33: 52499MCK 11204 2 0 1 2.0 34: 52499MCK 11205 2 1 1 2.0 35: 52499MCK 11205 2 1 1 2.0 code item sales action cat output
The following rows have been updated:
mydat[output != sales]
code item sales action cat output 1: 52382MCK 11708 10 1 2 2.5 2: 52382MCK 11710 15 1 2 10.0 3: 52382MCK 11710 15 1 4 10.0 4: 52382MCK 11710 15 1 6 10.0
Upvotes: 1