Dilara
Dilara

Reputation: 11

aggregating a data frame over a column

I have a data frame, one of the columns representing years. Let's say

region <- c("Spain", "Italy", "Norway")
year   <- c("2010","2011","2012","2010","2011","2012","2010","2011","2012")
m1     <- c("10","11","12","13","14","15","16","17","18")
m2     <- c("20","30","40","50","60","70","80","90","100")
data   <- data.frame(region,year,m1,m2)

I want to aggregate the data set m1 in a way taking 3-year averages for each country. I am confused in how to do that with a data frame. Any comment is highly appreciated. Thanks in advance!

Upvotes: 1

Views: 76

Answers (1)

JasonAizkalns
JasonAizkalns

Reputation: 20463

First, your m1 variable needs to be numeric. Convert it using as.numeric():

data$m1 <- as.numeric(as.character(data$m1))

Then, you can use aggregate like this:

aggregate(m1 ~ region, FUN = mean, data = data)

#   region m1
# 1  Italy 14
# 2 Norway 15
# 3  Spain 13

To avoid the awkward type conversion (as.numeric(as.character())), you should eliminate the quotes from the setup for m1 and m2:

m1     <- c(10,11,12,13,14,15,16,17,18)
m2     <- c(20,30,40,50,60,70,80,90,100)

Alternative approach using dplyr:

library(dplyr)

region <- c("Spain", "Italy", "Norway")
year   <- c("2010","2011","2012","2010","2011","2012","2010","2011","2012")
m1     <- c(10,11,12,13,14,15,16,17,18)
m2     <- c(20,30,40,50,60,70,80,90,100)
data   <- data.frame(region,year,m1,m2)

data %>%
  group_by(region) %>%
  summarise(mean_m1 = mean(m1),
            mean_m2 = mean(m2))

#   region mean_m1 mean_m2
# 1  Italy      14      60
# 2 Norway      15      70
# 3  Spain      13      50

Upvotes: 1

Related Questions