Smerla
Smerla

Reputation: 231

Replace a variable in a logical operator within a function

EDIT: This question was solved as the function worked out when a typo was corrected. I corrected the typo and leave the example as a reference of possible use for others in the future. More efficient solutions are also suggested in the answers.

Original (corrected) post:
I would like to be able to make a function that performs a calculation for different subsets of a data, using a logical operator to define the sub sets.

I will give you a simplified example using a data frame containing 2 groups ("A" and "B") with 2 variables each:

df <- data.frame(matrix(0, ncol = 2, nrow = 4))
colnames(df) <- c("group","var")
df$group <- c("A","B")
df$var <- c(1,4,1,4)

To calculate e.g. the mean of the different groups, A and B, it is possible to use the logical operator == to subset the data:

>mean(df$var[df$group=="A"])
[1] 1
>mean(df$var[df$group=="B"])
[1] 4

This is of course easy to do with only a few groups, but if you have a larger dataset, it would be convenient to be able to make a function that calculates the mean for several different groups (providing the names of those for example in the form of a vector). My idea (which is obviously not right) of construction such a function would look something like this:

autoMean <- function (q) {
mean(df$var[df$group==q])
}

And be run like this, in order to get the means for the 2 groups, A and B:

groups<-c("A","B")
autoMean(groups)

Now, R does not complain when I define the function and it works fine. (But be aware that when running the function with multiple groups, the function will calculate the mean of the two means (or the total).)

So, putting the variable of a function inside a logical operator do work, opposed to what I believed when I posted this question.

There are other, possibly more elegant, ways of solving this kind of a problem presented in the kindly provided answers below.

Upvotes: 2

Views: 243

Answers (4)

dardisco
dardisco

Reputation: 5274

Also:

aggregate(var ~ group, data=df, FUN=mean)
library(plyr)
ddply(df, .(group), summarize, mean=mean(var))
### add column with mean of each group
cbind(df, with(df, ave(var, group)))

Careful that calling something df overwrites the F Distribution in package:stats which is loaded by default.

Upvotes: 3

Nan
Nan

Reputation: 456

An even slicker approach would be using the dplyr() package.

library(dplyr)
summarise(group_by(df, group), 
          meanValue = mean(var))

Upvotes: 3

Nan
Nan

Reputation: 456

I think you have a typo in your original function definition. That's probably causing your error- try this?

 autoMean <- function (df, q) {
    mean(df$var[df$group==q])
    return(data.frame(q = q, mean= mean(df$var[df$group==q]) ))
  }

  groups<-c("A","B")
results <- lapply(groups, autoMean, df = df)

Upvotes: 1

sgibb
sgibb

Reputation: 25736

Maybe you are looking for tapply:

tapply(X=df$var, INDEX=df$group, FUN=mean)
# A B 
# 1 4

Upvotes: 3

Related Questions