Reputation: 35
I have a data set that tracks sales (logmove) for certain customer characteristics and am trying to find the quartile ranges for a certain characteristic (income). I have created the quartile ranges of INCOME but I need to find the average of sales for each quartile, which I cannot figure out.
code so far
oj = read.csv("oj.csv")
dom = (subset(oj, brand == "dominicks"))
summary(dom$INCOME)
applyQuant = function(x){
cut(x, breaks = c(quantile(dom$INCOME, probs = seq(0,1, by = 0.25))),
labels = c("Q1", "Q2", "Q3", "Q4"), include.lowest = TRUE)
}
dom.quant = sapply(dom$INCOME, applyQuant)
Basically I need to have four groups of incomes (x-variable) based on quartiles then find the average sales (y value) for each quantile range.
Upvotes: 1
Views: 118
Reputation: 107652
Simply assign the quantile result as a new column. Then run ave
or aggregate
for average sales.
dom$quant <- sapply(dom$INCOME, applyQuant)
In fact, cut
does not require a loop such as sapply
so assign column directly:
dom$quant <- cut(dom$INCOME,
breaks = c(quantile(dom$INCOME, probs = seq(0,1, by = 0.25))),
labels = c("Q1", "Q2", "Q3", "Q4"), include.lowest = TRUE)
# NEW COLUMN AGGREGATION
dom$quant_sales_mean <- with(dom, ave(SALES, quant, FUN=mean))
dom
# NEW DATA FRAME AGGREGATION
agg_df <- aggregate(SALES ~ quant, dom, mean)
agg_df
Upvotes: 2