Reputation: 121
I have a data.frame "df" which has 200 observations and 18 columns. The 18 columns are var1, var2, etc.... When I use:
tapply(df$var1, INDEX=df$varX, FUN=mean, na.rm=T)
where varX is a fixed value of a certain variable (var) of type string, I get the mean of var1 for each value of varX. my question is: How may I put the above command in a for loop such that it would iterate the same command such that it will cover all variables (var1, var2, ...etc) except of course varX? I tried this:
for (k in c(var1, var2, ..., varn)) {
tapply(df$k, INDEX=df$varX, FUN=mean, na.rm=T)
}
But it did not work.
Please note: I am sure much more effective and elegant methods/scripts can be used, but since I am a beginner, and so much behind, I sometimes try to go ahead and apply some ideas before I finish reading the respective chapter of a book I have. This is why my method(s) sometimes look primitive.
Upvotes: 1
Views: 1555
Reputation: 99331
You could use rowsum()
, which is one of the fastest base R aggregation functions (although here we'll need to divide it by the counts of the grouping variable to get the mean).
Following BrodieG's example using data(iris)
grouped by Species
, we can do
grp <- iris$Species
rowsum(iris[-5], grp, na.rm = TRUE) / tabulate(grp, nlevels(grp))
# Sepal.Length Sepal.Width Petal.Length Petal.Width
# setosa 5.006 3.428 1.462 0.246
# versicolor 5.936 2.770 4.260 1.326
# virginica 6.588 2.974 5.552 2.026
Upvotes: 1
Reputation: 52637
The most direct adaptation of what you are looking for (using iris
as the example data frame) is:
for(k in iris[-5]) # we loop through the columns in `iris`, except last
print(tapply(k, INDEX=iris$Species, FUN=mean, na.rm=T))
Which produces:
setosa versicolor virginica
5.006 5.936 6.588
setosa versicolor virginica
3.428 2.770 2.974
setosa versicolor virginica
1.462 4.260 5.552
setosa versicolor virginica
0.246 1.326 2.026
Slightly more elegantly using sapply
instead of for
:
sapply(iris[-5], tapply, INDEX=iris$Species, mean, na.rm=T)
which produces:
Sepal.Length Sepal.Width Petal.Length Petal.Width
setosa 5.006 3.428 1.462 0.246
versicolor 5.936 2.770 4.260 1.326
virginica 6.588 2.974 5.552 2.026
But really, you want to use aggregate
, dplyr
, or data.table
as others have suggested:
data.table(iris)[, lapply(.SD, mean, na.rm=TRUE), by=Species]
iris %>% group_by(Species) %>% summarise_each(funs(mean(., na.rm=TRUE)))
aggregate(. ~ Species, iris, mean, na.rm = TRUE) # Courtesy David Arenburg
The firs two require loading the packages data.table
and dplyr
respectively.
Upvotes: 1
Reputation: 15458
library(dplyr)
df %>%
na.omit() %>%
group_by(varX) %>%
summarise_each(funs(mean))
Upvotes: 1