psysky
psysky

Reputation: 3195

The script does not calculate the median if it is zero in R

I have the script

library(dplyr)
newest=mydat %>% filter(SaleCount > 0) %>%  #First filter out for SaleCount > 0 which of our interest
  group_by(CustomerName,ItemRelation, DocumentNum, DocumentYear,CustomerType) %>%
  mutate(k = mean(SaleCount[IsPromo==1]),
         m0 = median(tail(SaleCount[IsPromo==0], 5))) %>%  # Calculate m and m0 for all rows
  filter(IsPromo == 1) %>%  # Now keep only rows with IsPromo == 1

   mutate(r = (k-m0)*n())  %>% distinct()

This script

1. calculate mean value for salecount for 1 category of Ispromo
(without negative value and zero values)
2. for zero category of ispromo , it calculates medians for 5 last obs. by salescount
(without negative value and zero values)
3. than it subtracts median from mean and multiply result on  the count of non-zero and non-negative values for 1 category of ispromo

But sometimes the median can be equal =0 like in this example

mydat=structure(list(ItemRelation = c(11712L, 11712L, 11712L, 11712L, 
11712L, 11712L, 11712L, 11712L, 11712L, 11712L, 11712L, 11712L, 
11712L, 11712L, 11712L), SaleCount = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 18L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), DocumentNum = c(197L, 197L, 
197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 
197L, 197L), DocumentYear = c(2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L), IsPromo = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), CustomerType = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), CustomerName = c(2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), .Names = c("ItemRelation", 
"SaleCount", "DocumentNum", "DocumentYear", "IsPromo", "CustomerType", 
"CustomerName"), class = "data.frame", row.names = c(NA, -15L
))

In this case code writes NA and then it doesn't subtrack median from mean and doesn't multiply.

simple example

ItemRelation    SaleCount   DocumentNum k   m0  r
11712             18    197           18    NA  NA

How to do that it take into account zero median and work correct?

Edited for AAron answer

mean of salescount must be multiply on the count of non-zero and non-negative values for 1 category of ispromo. How to do it?

Upvotes: 0

Views: 941

Answers (2)

SeGa
SeGa

Reputation: 9809

What if you set all NA to 0 and include another mutate (mutate_all(funs(ifelse(is.na(.), 0, .))))?

newest=mydat %>% filter(SaleCount > 0) %>%  #First filter out for SaleCount > 0 which of our interest
  group_by(ItemRelation, DocumentNum, DocumentYear) %>%
  mutate(k = mean(SaleCount[IsPromo==1]),
         m0 = median(tail(SaleCount[IsPromo==0], 5))) %>%  # Calculate m and m0 for all rows
  mutate_all(funs(ifelse(is.na(.), 0, .))) %>% 
  filter(IsPromo == 1) %>%  # Now keep only rows with IsPromo == 1
  mutate(r = (k-m0)*n())  %>% distinct()

Which would result in the following:

  ItemRelation SaleCount DocumentNum DocumentYear IsPromo CustomerType CustomerName  k m0  r
1        11712        18         197         2017       1            1            2 18  0 18

Upvotes: 1

Aaron - mostly inactive
Aaron - mostly inactive

Reputation: 37754

There's a problem in your logic, not your code; you first say you want the median of the last five values without negative and zero values, but then say the median should be zero. But because of the first, you've removed all the zero values in your filter, so then all the values are zero (when IsPromo=0) and there's no data left to take the median of.

Upvotes: 4

Related Questions