Pallavi
Pallavi

Reputation: 313

Calculate mean of respective column values based on condition

I have a data.frame named sampleframe where I have stored all the table values. Inside sampleframe I have columns id, month, sold.

 id month   SMarch  SJanFeb churn
 101    1   0.00    0.00    1
 101    2   0.00    0.00    1
 101    3   0.00    0.00    1
 108    2   0.00    6.00    1
 103    2   0.00    10.00   1
 160    1   0.00    2.00    1
 160    2   0.00    3.00    1
 160    3   0.50    0.00    0
 164    1   0.00    3.00    1
 164    2   0.00    6.00    1

I would like to calculate average sold for last three months based on ID. If it is month 3 then it has to consider average sold for the last two months based on ID, if it is month 2 then it has to consider average sold for 1 month based on ID., respectively for all months.

I have used ifelse and mean function to avail it but some rows are missing when i try to use it for all months

Query that I have used for execution

sampleframe$Churn <- ifelse(sampleframe$Month==4|sampleframe$Month==5|sampleframe$Month==6, ifelse(sampleframe$Sold<0.7*mean(sampleframe$Sold[sampleframe$ID[sampleframe$Month==-1&sampleframe$Month==-2&sampleframe$Month==-3]]),1,0),0)

adding according to the logic of the query it should compare with the previous months sold value of 70% and if the current value is higher than previous average months values then it should return 1 else 0

Upvotes: 1

Views: 726

Answers (2)

Pallavi
Pallavi

Reputation: 313

Solution for above Question can be done by using library(dplyr) and use this query to avail the output

resultData <- group_by(data, KId) %>% 
    arrange(sales_month) %>% 
    mutate(monthMinus1Qty = lag(quantity_sold,1), monthMinus2Qty = lag(quantity_sold, 2)) %>% 
    group_by(KId, sales_month) %>%
    mutate(previous2MonthsQty = sum(monthMinus1Qty, monthMinus2Qty, na.rm = TRUE)) %>%  
    mutate(result = ifelse(quantity_sold/previous2MonthsQty >= 0.6,0,1)) %>%
    select(KId,sales_month, quantity_sold, result)

link to refer for solution and output Answer

Upvotes: 0

akrun
akrun

Reputation: 886938

Not clear about the expected output. Based on the description about calculating average 'sold' for each 3 months, grouped by 'id', we can use roll_mean from library(RcppRoll). We convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'id', if the number of rows is greater than 1, we get the roll_mean with n specified as 3 and concatenate with the averages for less than 3 or else i.e. for 1 observation, get the value itself.

library(RcppRoll)
library(data.table)
k <- 3
setDT(df1)[, soldAvg := if(.N>1) c(cumsum(sold[1:(k-1)])/1:(k-1), 
      roll_mean(sold,n=k, align='right')) else as.numeric(sold), id]

df1
 #     id month sold  soldAvg
 #1: 101     1  124 124.0000
 #2: 101     2  211 167.5000
 #3: 104     3  332 332.0000
 #4: 105     4  124 124.0000
 #5: 101     5  211 182.0000
 #6: 101     6  332 251.3333
 #7: 101     7  124 222.3333
 #8: 101     8  211 222.3333
 #9: 101     9  332 222.3333
#10: 102    10  124 124.0000
#11: 102    12  211 167.5000
#12: 104     3  332 332.0000
#13: 105     4  124 124.0000
#14: 102     5  211 182.0000
#15: 102     6  332 251.3333
#16: 106     7  124 124.0000
#17: 107     8  211 211.0000
#18: 102     9  332 291.6667
#19: 103    11  124 124.0000
#20: 103     2  211 167.5000
#21: 108     3  332 332.0000
#22: 108     4  124 228.0000
#23: 109     5  211 211.0000
#24: 103     6  332 222.3333
#25: 104     7  124 262.6667
#26: 105     8  211 153.0000
#27: 103    10  332 291.6667

Upvotes: 1

Related Questions