Head and toes
Head and toes

Reputation: 659

using tapply in multiple variables

I have a set of data which contains information about customers and how much they have spent, each customer only appears once:

customer<-c("Andy","Bobby","Oscar","Oliver","Jane","Cathy","Emma","Chris")
age<-c(25,34,20,35,23,35,34,22)
gender<-c("male","male","male","male","female","female","female","female")
moneyspent<-c(100,100,200,200,400,400,500,200)

data<-data.frame(customer=customer,age=age,gender=gender,moneyspent=moneyspent)

If I want to calculate the average amount of money spent by male and female customers, I can use tapply:

tapply(moneyspent,gender,mean)

which gives:

female   male 
  375    150

However, I now want to find the average amount of money spent by both gender and age group and the result I am aiming for is:

 Male Age 20-30      Female Age 20-30      Male Age 30-40      Female Age 30-40
    150                     300                 150                   450

How could I modifty the tapply code such that it gives these results?

THANK YOU

Upvotes: 1

Views: 5659

Answers (2)

agent18
agent18

Reputation: 2297

Using plyr package

library(plyr)

ddply(data,.(gender, age=cut(age, breaks=c(20,30,40), 
                  include.lowest=TRUE)), summarize, moneyspent=mean(moneyspent))

Will also give the same result.

Note: Summarize and Summarise perform the same function.

Warning: loading plyr masks the Summarise of dplyr! You need to detach plyr before using functions like Summarize again.

Upvotes: 0

akrun
akrun

Reputation: 887851

You may need to use cut

mat <- tapply(moneyspent, list(gender, age=cut(age, breaks=c(20,30,40), 
                include.lowest=TRUE)), mean)

nm1 <- outer(rownames(mat), colnames(mat), FUN=paste)
setNames(c(mat), nm1)
#female [20,30]   male [20,30] female (30,40]   male (30,40] 
#       300            150            450            150 

Other options include

library(dplyr)
data %>% 
     group_by(gender, age=cut(age, breaks=c(20,30,40), 
              include.lowest=TRUE)) %>% 
     summarise(moneyspent=mean(moneyspent))

Or

 library(data.table)
 setDT(data)[, list(moneyspent=mean(moneyspent)),
     by=list(gender, age=cut(age, breaks= c(20,30,40), include.lowest=TRUE))]

Upvotes: 2

Related Questions