user9170959
user9170959

Reputation: 35

How to apply a function to a quartile subset?

I have a data set that tracks sales (logmove) for certain customer characteristics and am trying to find the quartile ranges for a certain characteristic (income). I have created the quartile ranges of INCOME but I need to find the average of sales for each quartile, which I cannot figure out.

code so far

oj = read.csv("oj.csv")
dom = (subset(oj, brand == "dominicks"))
summary(dom$INCOME)

applyQuant = function(x){
  cut(x, breaks = c(quantile(dom$INCOME, probs = seq(0,1, by = 0.25))), 
      labels = c("Q1", "Q2", "Q3", "Q4"), include.lowest = TRUE)
}
dom.quant = sapply(dom$INCOME, applyQuant)

Basically I need to have four groups of incomes (x-variable) based on quartiles then find the average sales (y value) for each quantile range.

Upvotes: 1

Views: 118

Answers (1)

Parfait
Parfait

Reputation: 107652

Simply assign the quantile result as a new column. Then run ave or aggregate for average sales.

dom$quant <- sapply(dom$INCOME, applyQuant)

In fact, cut does not require a loop such as sapply so assign column directly:

dom$quant <- cut(dom$INCOME, 
                 breaks = c(quantile(dom$INCOME, probs = seq(0,1, by = 0.25))), 
                 labels = c("Q1", "Q2", "Q3", "Q4"), include.lowest = TRUE)

# NEW COLUMN AGGREGATION
dom$quant_sales_mean <- with(dom, ave(SALES, quant, FUN=mean))
dom

# NEW DATA FRAME AGGREGATION
agg_df <- aggregate(SALES ~ quant, dom, mean)
agg_df

Upvotes: 2

Related Questions