Reputation: 9

Calculate mean for subset of column

I can't figure out how to calculate the mean for a subset of a column in R. My particular question is calculating "expenditures" for "age" 40+ and <40. I've tried

mean(expenditures[["age">=40]])

and gotten success, but

mean(expenditures[["age"<40]])

was not successful.

I am therefore stuck on this problem. I'll greatly appreciate any help on this seemingly simple question.

Upvotes: 0

Answers (2)

Paul

Reputation: 2959

You could do it in one hit by mutating a group column, group_by() that column and use summarise() to calculate the mean:

library(dplyr)

data("mtcars")

mtcars %>%
  group_by(group = ifelse(hp > 100, "> 100", "<= 100")) %>%
  summarise(mean = mean(hp))

gives:

# A tibble: 2 x 2
  group   mean
  <chr>  <dbl>
1 <= 100  76.3
2 > 100   174.

Note: Thanks Tino for the tips!

Upvotes: 2

Tino

Reputation: 2101

If you don't want to use additional packages:

# some sample data:
set.seed(123)
df <- data.frame(age = sample(x = 20:50, size = 100, replace = TRUE),
                 expenditures = runif(n = 100, min = 100, max = 1000))

aggregate(
  formula = expenditures ~ age >= 40,
  data = df,
  FUN = mean
)

And to add to Paul's solution, you could also create the group within group_by:

library(dplyr)
# using dplyr:
df %>% 
  group_by(age >= 40) %>% 
  summarise_at(.vars = vars(expenditures), mean)

Upvotes: 1

Calculate mean for subset of column

Answers (2)

Related Questions