Faiz Lotfy
Faiz Lotfy

Reputation: 121

How to use tapply from within a for loop

I have a data.frame "df" which has 200 observations and 18 columns. The 18 columns are var1, var2, etc.... When I use:

tapply(df$var1, INDEX=df$varX, FUN=mean, na.rm=T)

where varX is a fixed value of a certain variable (var) of type string, I get the mean of var1 for each value of varX. my question is: How may I put the above command in a for loop such that it would iterate the same command such that it will cover all variables (var1, var2, ...etc) except of course varX? I tried this:

for (k in c(var1, var2, ..., varn)) {
tapply(df$k, INDEX=df$varX, FUN=mean, na.rm=T)
}

But it did not work.

Please note: I am sure much more effective and elegant methods/scripts can be used, but since I am a beginner, and so much behind, I sometimes try to go ahead and apply some ideas before I finish reading the respective chapter of a book I have. This is why my method(s) sometimes look primitive.

Upvotes: 1

Views: 1555

Answers (3)

Rich Scriven
Rich Scriven

Reputation: 99331

You could use rowsum(), which is one of the fastest base R aggregation functions (although here we'll need to divide it by the counts of the grouping variable to get the mean).

Following BrodieG's example using data(iris) grouped by Species, we can do

grp <- iris$Species
rowsum(iris[-5], grp, na.rm = TRUE) / tabulate(grp, nlevels(grp))
#            Sepal.Length Sepal.Width Petal.Length Petal.Width
# setosa            5.006       3.428        1.462       0.246
# versicolor        5.936       2.770        4.260       1.326
# virginica         6.588       2.974        5.552       2.026

Upvotes: 1

BrodieG
BrodieG

Reputation: 52637

The most direct adaptation of what you are looking for (using iris as the example data frame) is:

for(k in iris[-5])   # we loop through the columns in `iris`, except last
  print(tapply(k, INDEX=iris$Species, FUN=mean, na.rm=T))

Which produces:

setosa versicolor  virginica 
 5.006      5.936      6.588 
setosa versicolor  virginica 
 3.428      2.770      2.974 
setosa versicolor  virginica 
 1.462      4.260      5.552 
setosa versicolor  virginica 
 0.246      1.326      2.026 

Slightly more elegantly using sapply instead of for:

sapply(iris[-5], tapply, INDEX=iris$Species, mean, na.rm=T)

which produces:

           Sepal.Length Sepal.Width Petal.Length Petal.Width
setosa            5.006       3.428        1.462       0.246
versicolor        5.936       2.770        4.260       1.326
virginica         6.588       2.974        5.552       2.026

But really, you want to use aggregate, dplyr, or data.table as others have suggested:

data.table(iris)[, lapply(.SD, mean, na.rm=TRUE), by=Species]
iris %>% group_by(Species) %>% summarise_each(funs(mean(., na.rm=TRUE)))
aggregate(. ~ Species, iris, mean, na.rm = TRUE) # Courtesy David Arenburg

The firs two require loading the packages data.table and dplyr respectively.

Upvotes: 1

Metrics
Metrics

Reputation: 15458

library(dplyr)
df %>%
na.omit() %>%
group_by(varX) %>%
summarise_each(funs(mean))

Upvotes: 1

Related Questions