psysky
psysky

Reputation: 3195

multiply values on median by group in R

I have dataset

df=structure(list(SKU = c(11202L, 11202L, 11202L, 11202L, 11202L, 
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L
), stuff = c(8.85947691, 9.450108704, 10.0407405, 10.0407405, 
10.63137229, 11.22200409, 11.22200409, 11.81263588, 12.40326767, 
12.40326767, 12.40326767, 12.99389947, 13.58453126, 14.17516306, 
14.76579485, 15.94705844, 17.12832203, 17.71895382, 21.26274458, 
25.98779894, 63.19760196), action = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L), 
    acnumber = c(137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 
    137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 
    137L, 137L, 137L), year = c(2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L)), .Names = c("SKU", 
"stuff", "action", "acnumber", "year"), class = "data.frame", row.names = c(NA, 
-21L))

The action column has only two values 0 and 1. As we can see there is 3 observations by stuff of 1 category and 18 obs by stuff of zero category.

I need -Calculate median for the stuff variable only for category 1 (it is equal to 25.98779894) without zeros. As we can see there are zeros between one, they need to be removed, as well as negative values, if they exist. I.e, as if the dataset was like this:

structure(list(SKU = c(11202L, 11202L, 11202L, 11202L, 11202L, 
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L
), stuff = c(8.85947691, 9.450108704, 10.0407405, 10.0407405, 
10.63137229, 11.22200409, 11.22200409, 11.81263588, 12.40326767, 
12.40326767, 12.40326767, 12.99389947, 13.58453126, 14.17516306, 
14.76579485, 15.94705844, 17.12832203, 17.71895382, 21.26274458, 
25.98779894, 63.19760196), action = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 1L, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L), 
    acnumber = c(137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 
    137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 
    137L, 137L, 137L), year = c(2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L)), .Names = c("SKU", 
"stuff", "action", "acnumber", "year"), class = "data.frame", row.names = c(NA, 
-21L))

Also i need calculate the median for the last three observations by stuff variable for category 0, which go before the first one, in our case it is 12,40326767

then subtract from the median for category 1 the median by category 0 and multiply by the number of one, in this case 3.

(25,98779894-12,40326767)*3=40,75359381

this solution

df %>%
  group_by(SKU,acnumber,year) %>%
  summarize(value = 3*(median(stuff[action==1]) - median(stuff[match(1,action)-3:1])),
            stuff=first(stuff),
            action = sum(action)) %>%
  select(SKU,stuff,action,acnumber,year,value)

by Moody_Mudskipper helped me

But!In this example, the number of ones by action was three , so we multiply by three, but the number of ones can be greater than 3 or less than three. how to multiply on real numbers of ones? for example if we have 2 ones by action for stuff,then

summarize(value = 2*(median(stuff[action==1]) - median(stuff[match(1,action)-3:1])),

so that do not manually enter it each time.

the solution sum(df$action == 1) is not suitable

summarize(value = sum(df$action == 1)*(median(stuff[action==1]) - median(stuff[match(1,action)-3:1])),

because it sums up all the ones of dataset and then there is incorrect multiplication. Total count of ones=692 and this number multiplies

 summarize(value = 692*(median(stuff[action==1]) - median(stuff[match(1,action)-3:1])),

it is wrong The multiplication of ones must be for each particular group SKU,acnumber,year

111-23-2018 is first group has 3 ones
112-24-2018 is second group has 2 ones

and so on

How to do it correct?

Upvotes: 0

Views: 150

Answers (1)

Onyambu
Onyambu

Reputation: 79208

df%>%
   group_by(SKU,acnumber,year)%>%
   summarise(s=sum(action),k=which(action==1)[1],
            l=s*(median(stuff[action==1])-median(stuff[(k-s+1):k])))%>%
   data.frame()
    SKU acnumber year s  k        l
1 11202      137 2018 3 11 40.75359

Upvotes: 1

Related Questions