Calculate sum by grouping by column value in R

I have a data frame with two columns, a Ref_Date column and a Value column. The date column contains 12 rows for each year, from 1988 until 2015. What I need to do is to group by the year only and summarize the Value column so that I can get only one row for each year containing the sum of all values for each of the 12 months of that year:

row.names   Ref_Date    Value
166483      1989/01     713
166484      1989/02     771
166485      1989/03     565
166486      1989/04     1248
166487      1989/05     1380
166488      1989/06     1118
166489      1989/07     1026
166490      1989/08     995
166491      1989/09     835
166492      1989/10     939
166493      1989/11     878
166494      1989/12     1075
166495      1990/01     878
166496      1990/02     563
166497      1990/03     773
166498      1990/04     1131
166499      1990/05     1562
166500      1990/06     1747
166501      1990/07     1258
166502      1990/08     791

Upvotes: 0

Views: 648

Answers (2)

talat
talat

Reputation: 70266

You can use the following code with dplyr:

library(dplyr)
df %>% 
  group_by(year = substr(Ref_Date, 1, 4)) %>%     # create the groups
  summarise(Value = sum(Value))

#Source: local data frame [2 x 2]
#
#  year Value
#1 1989 11543
#2 1990  8703

Or similarly with data.table package

library(data.table)
setDT(df)[, sum(Value), by = .(year = substr(Ref_Date, 1, 4))]
#   year    V1
#1: 1989 11543
#2: 1990  8703

Or with base R

with(df, aggregate(Value ~ cbind(year = substr(Ref_Date, 1, 4)), FUN = sum))
#  year Value
#1 1989 11543
#2 1990  8703

Upvotes: 2

anon
anon

Reputation:

Another answer could be the following (by using tapply):

years <- 1988:2015 ## or first.year:last.year
sums <- tapply(df$Value, substr(df$Ref_Date, 1, 4)), sum)
new.df <- data.frame(years = years, sums = sums)

EDIT: Just a more general solution to avoid standard dates (but it's basically similar to the one posted above):

years <- substr(df$Ref_Date, 1, 4)
sums <- tapply(df$Value, years, sum)
new.df <- data.frame(years = unique(years), sum = sums)

Upvotes: 1

Related Questions