N.Varela
N.Varela

Reputation: 910

R: How to sum up (aggregate) values of dfs according to column criteria within a list?

I want to sum up all Values from the same country in a list of dfs (df-wise!). Here some example data:

df1 <- data.frame(CNTRY = c("A", "B", "C"), Value=c(3,1,4))
df2 <- data.frame(CNTRY = c("A", "B", "C", "C"),Value=c(3,5,8,7))
dfList <- list(df1, df2)
names(dfList) <- c("111.2000", "112.2000")

My list is consisting of dfs (only dfs) with different rownumbers, but same columnsstructure for all dfs. The listnames are a mixture of articleIDs and Year, this are more then 1000 dfs. Now my question: How can sum up or aggregate the country values in all dfs? My expected result is:

$`111.2000`
  CNTRY Value
1     A     3
2     B     1
3     C     4

$`112.2000`
  CNTRY Value
1     A     3
2     B     5
3     C    15

I tried aggregate(Value ~ CNTRY, data=dfList,FUN=sum) which delivers an error message as, CNTRY and VALUE are not the objects, but the columns within the objects. Any ideas? Thanks in advance.

Upvotes: 2

Views: 275

Answers (2)

Frank
Frank

Reputation: 66819

A list of DFs is better than a bunch of DFs in the wild, but a single DF is even better:

library(data.table)
DF = rbindlist(dfList, idcol="id")

         id CNTRY Value
1: 111.2000     A     3
2: 111.2000     B     1
3: 111.2000     C     4
4: 112.2000     A     3
5: 112.2000     B     5
6: 112.2000     C     8
7: 112.2000     C     7

From there, you can aggregate according to data.table syntax

DTres <- DF[, .(Value = sum(Value)), by=.(id, CNTRY)]

         id CNTRY Value
1: 111.2000     A     3
2: 111.2000     B     1
3: 111.2000     C     4
4: 112.2000     A     3
5: 112.2000     B     5
6: 112.2000     C    15

From here, you can do things like

dcast(DTres, id ~ CNTRY)

         id A B  C
1: 111.2000 3 1  4
2: 112.2000 3 5 15

I'm sure there's some way to do this in base R as well, but I'd say don't bother.

Upvotes: 4

Rich Scriven
Rich Scriven

Reputation: 99351

Use lapply() for applying the aggregate() function over dfList.

lapply(dfList, function(x) aggregate(Value ~ CNTRY, x, sum))
# $`111.2000`
#   CNTRY Value
# 1     A     3
# 2     B     1
# 3     C     4
#
# $`112.2000`
#   CNTRY Value
# 1     A     3
# 2     B     5
# 3     C    15

Upvotes: 4

Related Questions