nathaneastwood
nathaneastwood

Reputation: 3764

Removing a loop in lapply

I have a loop which I would like to get rid of, I just can't quite see how too. Say I have a dataframe:

tmp = data.frame(Gender = rep(c("Male", "Female"), each = 6), 
                 Ethnicity = rep(c("White", "Asian", "Other"), 4),
                 Score = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))

I then want to calculate the mean for each level in both the Gender and Ethnicity columns which would give:

$Female
[1] 9.5

$Male
[1] 3.5

$Asian
[1] 6.5

$Other
[1] 7.5

$White
[1] 5.5

This is easy enough to do, but I don't want to use loops - I'm going for speed. So I currently have the following:

for(i in c("Gender", "Ethnicity"))
    print(lapply(split(tmp$Score, tmp[, i]), function(x) mean(x)))

Obviously, this uses a loop and is where I am stuck.

There may well be a function which already does this kind of thing that I am unaware of. I have looked at aggregate but I don't think that's what I want.

Upvotes: 3

Views: 206

Answers (6)

John
John

Reputation: 23758

You should probably reconsider the output you're generating. A list containing all of the ethnicity and gender variables together is probably not the best way to go about graphing, analyzing, or presenting your data. You might be best off breaking down and writing two lines of code instead of that one off using perhaps tapply

tapply(tmp$Score, tmp$Gender, mean)
tapply(tmp$Score, tmp$Ethnicity, mean)

or aggregate

aggregate(Score ~ Gender, tmp, mean)
aggregate(Score ~ Ethnicity, tmp, mean)

And then, perhaps you might want to look at your interaction even though you suggested aggregate doesn't do what you really want.

with(tmp, tapply(Score, list(Gender, Ethnicity), mean))
aggregate(Score ~ Gender + Ethnicity, tmp, mean)

Not only do these lead you to better separation and presentation of the fundamental ideas presented by the variables but your R commands are more expressive and reflective of the intent in the data of separately coding those variables in the first place.

If your real task is to go at a number of variables any of these can be put into a loop but I would suggest you still want the output not as one single list but as a list of vectors or data.frames.

Upvotes: 0

tohweizhong
tohweizhong

Reputation: 120

Try the reshape2 package.

require(reshape2)

#demo
melted<-melt(tmp)
casted.gender<-dcast(melted,Gender~variable,mean) #for mean of each gender
casted.eth<-dcast(melted,Ethnicity~variable,mean) #for mean of each ethnicity

#now, combining to do for all variables at once
variables<-colnames(tmp)[-length(colnames(tmp))]

casting<-function(var.name){
    return(dcast(melted,melted[,var.name]~melted$variable,mean))
}

lapply(variables, FUN=casting)

output:

[[1]]
  melted[, var.name] Score
1             Female   9.5
2               Male   3.5

[[2]]
  melted[, var.name] Score
1              Asian   6.5
2              Other   7.5
3              White   5.5

Upvotes: 1

akrun
akrun

Reputation: 887128

Using dplyr

 library(dplyr)
 library(tidyr)
 tmp[,1:2] <- lapply(tmp[,1:2], as.character)
 tmp %>% 
     gather(Var1, Var2, Gender:Ethnicity) %>%
     unite(Var, Var1, Var2) %>% 
     group_by(Var) %>% 
     summarise(Score=mean(Score))

  #              Var Score
  #1 Ethnicity_Asian   6.5
  #2 Ethnicity_Other   7.5
  #3 Ethnicity_White   5.5
  #4   Gender_Female   9.5
  #5     Gender_Male   3.5

Upvotes: 2

anonR
anonR

Reputation: 929

You can use the code:

c(tapply(tmp$Score,tmp$Gender,mean),tapply(tmp$Score,tmp$Ethnicity,mean))

Upvotes: 2

arvi1000
arvi1000

Reputation: 9582

You can nest apply functions.

sapply(c("Gender", "Ethnicity"),
       function(i) {
         print(lapply(split(tmp$Score, tmp[, i]), function(x) mean(x)))
       })

Upvotes: 2

Stephan Kolassa
Stephan Kolassa

Reputation: 8267

You can sapply() over the names of tmp, except for Score, and then use by() (or aggregate()):

> sapply(setdiff(names(tmp),"Score"),function(xx)by(tmp$Score,tmp[,xx],mean))
$Gender
tmp[, xx]: Female
[1] 9.5
------------------------------------------------------------ 
tmp[, xx]: Male
[1] 3.5

$Ethnicity
tmp[, xx]: Asian
[1] 6.5
------------------------------------------------------------ 
tmp[, xx]: Other
[1] 7.5
------------------------------------------------------------ 
tmp[, xx]: White
[1] 5.5

However, this internally uses a loop, so it won't speed up a lot...

Upvotes: 3

Related Questions