paploo
paploo

Reputation: 173

grouping data in R and summing by decade

I have the following dataset:

ireland england france year
5         3      2     1920
4         3      4     1921
6         2      1     1922
3         1      5     1930
2         5      2     1931

I need to summarise the data by 1920's and 1930's. So I need total points for ireland, england and france in the 1920-1922 and then another total point for ireland,england and france in 1930,1931.

Any ideas? I have tried but failed.

Dataset:

x <- read.table(text = "ireland england france 
5         3      2     1920
4         3      4     1921
6         2      1     1922
3         1      5     1930
2         5      2     1931", header = T)

Upvotes: 5

Views: 4989

Answers (2)

Patricio Moracho
Patricio Moracho

Reputation: 717

An R base solution

As A5C1D2H2I1M1N2O1R2T1 mentioned, you can use findIntervals() to set corresponding decade for each year and then, an aggregate() to group py decade

txt <-
"ireland england france year
5         3      2     1920
4         3      4     1921
6         2      1     1922
3         1      5     1930
2         5      2     1931"

df <- read.table(text=txt, header=T)

decades <- c(1920, 1930, 1940)
df$decade<- decades[findInterval(df$year, decades)]
aggregate(cbind(ireland,england,france) ~ decade , data = df, sum)

Output:

  decade ireland england france
1   1920      15       8      7
2   1930       5       6      7

Upvotes: 0

loki
loki

Reputation: 10360

How about dividing the years by 10 and then summarizing?

library(dplyr)
x %>% mutate(decade = floor(year/10)*10) %>% 
      group_by(decade) %>% 
      summarize_all(sum) %>% 
      select(-year)
# A tibble: 2 x 5
#   decade ireland england france
#    <dbl>   <int>   <int>  <int>
# 1   1920      15       8      7
# 2   1930       5       6      7

Upvotes: 7

Related Questions