Reputation: 231
EDIT: This question was solved as the function worked out when a typo was corrected. I corrected the typo and leave the example as a reference of possible use for others in the future. More efficient solutions are also suggested in the answers.
Original (corrected) post:
I would like to be able to make a function that performs a calculation for different subsets of a data, using a logical operator to define the sub sets.
I will give you a simplified example using a data frame containing 2 groups ("A" and "B") with 2 variables each:
df <- data.frame(matrix(0, ncol = 2, nrow = 4))
colnames(df) <- c("group","var")
df$group <- c("A","B")
df$var <- c(1,4,1,4)
To calculate e.g. the mean of the different groups, A and B, it is possible to use the logical operator
to subset the data:==
>mean(df$var[df$group=="A"])
[1] 1
>mean(df$var[df$group=="B"])
[1] 4
This is of course easy to do with only a few groups, but if you have a larger dataset, it would be convenient to be able to make a function that calculates the mean for several different groups (providing the names of those for example in the form of a vector). My idea (which is obviously not right) of construction such a function would look something like this:
autoMean <- function (q) {
mean(df$var[df$group==q])
}
And be run like this, in order to get the means for the 2 groups, A and B:
groups<-c("A","B")
autoMean(groups)
Now, R does not complain when I define the function and it works fine. (But be aware that when running the function with multiple groups, the function will calculate the mean of the two means (or the total).)
So, putting the variable of a function inside a logical operator do work, opposed to what I believed when I posted this question.
There are other, possibly more elegant, ways of solving this kind of a problem presented in the kindly provided answers below.
Upvotes: 2
Views: 243
Reputation: 5274
Also:
aggregate(var ~ group, data=df, FUN=mean)
library(plyr)
ddply(df, .(group), summarize, mean=mean(var))
### add column with mean of each group
cbind(df, with(df, ave(var, group)))
Careful that calling something df
overwrites the F Distribution in package:stats
which is loaded by default.
Upvotes: 3
Reputation: 456
An even slicker approach would be using the dplyr() package.
library(dplyr)
summarise(group_by(df, group),
meanValue = mean(var))
Upvotes: 3
Reputation: 456
I think you have a typo in your original function definition. That's probably causing your error- try this?
autoMean <- function (df, q) {
mean(df$var[df$group==q])
return(data.frame(q = q, mean= mean(df$var[df$group==q]) ))
}
groups<-c("A","B")
results <- lapply(groups, autoMean, df = df)
Upvotes: 1
Reputation: 25736
Maybe you are looking for tapply
:
tapply(X=df$var, INDEX=df$group, FUN=mean)
# A B
# 1 4
Upvotes: 3