Reputation: 3195
I have the script
library(dplyr)
newest=mydat %>% filter(SaleCount > 0) %>% #First filter out for SaleCount > 0 which of our interest
group_by(CustomerName,ItemRelation, DocumentNum, DocumentYear,CustomerType) %>%
mutate(k = mean(SaleCount[IsPromo==1]),
m0 = median(tail(SaleCount[IsPromo==0], 5))) %>% # Calculate m and m0 for all rows
filter(IsPromo == 1) %>% # Now keep only rows with IsPromo == 1
mutate(r = (k-m0)*n()) %>% distinct()
This script
1. calculate mean value for salecount for 1 category of Ispromo
(without negative value and zero values)
2. for zero category of ispromo , it calculates medians for 5 last obs. by salescount
(without negative value and zero values)
3. than it subtracts median from mean and multiply result on the count of non-zero and non-negative values for 1 category of ispromo
But sometimes the median can be equal =0 like in this example
mydat=structure(list(ItemRelation = c(11712L, 11712L, 11712L, 11712L,
11712L, 11712L, 11712L, 11712L, 11712L, 11712L, 11712L, 11712L,
11712L, 11712L, 11712L), SaleCount = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 18L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), DocumentNum = c(197L, 197L,
197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L,
197L, 197L), DocumentYear = c(2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L), IsPromo = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), CustomerType = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), CustomerName = c(2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), .Names = c("ItemRelation",
"SaleCount", "DocumentNum", "DocumentYear", "IsPromo", "CustomerType",
"CustomerName"), class = "data.frame", row.names = c(NA, -15L
))
In this case code writes NA
and then it doesn't subtrack median from mean and doesn't multiply.
simple example
ItemRelation SaleCount DocumentNum k m0 r
11712 18 197 18 NA NA
How to do that it take into account zero median and work correct?
mean of salescount must be multiply on the count of non-zero and non-negative values for 1 category of ispromo. How to do it?
Upvotes: 0
Views: 941
Reputation: 9809
What if you set all NA to 0 and include another mutate
(mutate_all(funs(ifelse(is.na(.), 0, .)))
)?
newest=mydat %>% filter(SaleCount > 0) %>% #First filter out for SaleCount > 0 which of our interest
group_by(ItemRelation, DocumentNum, DocumentYear) %>%
mutate(k = mean(SaleCount[IsPromo==1]),
m0 = median(tail(SaleCount[IsPromo==0], 5))) %>% # Calculate m and m0 for all rows
mutate_all(funs(ifelse(is.na(.), 0, .))) %>%
filter(IsPromo == 1) %>% # Now keep only rows with IsPromo == 1
mutate(r = (k-m0)*n()) %>% distinct()
Which would result in the following:
ItemRelation SaleCount DocumentNum DocumentYear IsPromo CustomerType CustomerName k m0 r 1 11712 18 197 2017 1 1 2 18 0 18
Upvotes: 1
Reputation: 37754
There's a problem in your logic, not your code; you first say you want the median of the last five values without negative and zero values, but then say the median should be zero. But because of the first, you've removed all the zero values in your filter, so then all the values are zero (when IsPromo=0) and there's no data left to take the median of.
Upvotes: 4