Reputation: 9
I can't figure out how to calculate the mean for a subset of a column in R. My particular question is calculating "expenditures" for "age" 40+ and <40. I've tried
mean(expenditures[["age">=40]])
and gotten success, but
mean(expenditures[["age"<40]])
was not successful.
I am therefore stuck on this problem. I'll greatly appreciate any help on this seemingly simple question.
Upvotes: 0
Views: 2224
Reputation: 2959
You could do it in one hit by mutating a group column, group_by() that column and use summarise() to calculate the mean:
library(dplyr)
data("mtcars")
mtcars %>%
group_by(group = ifelse(hp > 100, "> 100", "<= 100")) %>%
summarise(mean = mean(hp))
gives:
# A tibble: 2 x 2
group mean
<chr> <dbl>
1 <= 100 76.3
2 > 100 174.
Note: Thanks Tino for the tips!
Upvotes: 2
Reputation: 2101
If you don't want to use additional packages:
# some sample data:
set.seed(123)
df <- data.frame(age = sample(x = 20:50, size = 100, replace = TRUE),
expenditures = runif(n = 100, min = 100, max = 1000))
aggregate(
formula = expenditures ~ age >= 40,
data = df,
FUN = mean
)
And to add to Paul's solution, you could also create the group within group_by
:
library(dplyr)
# using dplyr:
df %>%
group_by(age >= 40) %>%
summarise_at(.vars = vars(expenditures), mean)
Upvotes: 1