kwicher
kwicher

Reputation: 2082

Summarizing unknown number of column in R using dplyr

I have following data.frame (df)

ID1 ID2 Col1 Col2 Col3 Grp
A   B   1    3    6    G1
C   D   3    5    7    G1
E   F   4    5    7    G2
G   h   5    6    8    G2

What I would like to achieve is the following: - group by Grp, easy - and then summarize so that for each group I sum the columns and create the columns with strings with all ID1s and ID2s

It would be something like this:

df %>% 
   group_by(Grp) %>% 
      summarize(ID1s=toString(ID1), ID2s=toString(ID2), Col1=sum(Col1), Col2=sum(Col2), Col3=sum(Col3))

Everything is fine whae Iknow the number of the columns (Col1, Col2, Col3), however I would like to be able to implement it so that it would work for a data frame with known and always named the same ID1, ID2, Grp, and any number of additional numeric column with unknown names.

Is there a way to do it in dplyr.

Upvotes: 1

Views: 707

Answers (2)

user6278894
user6278894

Reputation:

Using the data table you could try the following:

   setDT(df)
   sd_cols=3:(ncol(df)-1)
   merge(df[ ,.(toString(ID1), toString(ID2)), by = Grp],  df[ , c(-1,-2), with = F][ , lapply(.SD, sum), by = Grp],by = "Grp")

Upvotes: 0

Frank
Frank

Reputation: 66819

I would like to be able to implement it so that it would work for a data frame with known and always named the same ID1, ID2, Grp, and any number of additional numeric column with unknown names.

You can overwrite the ID columns first and then group by them as well:

DF %>% 
  group_by(Grp) %>% mutate_each(funs(. %>% unique %>% sort %>% toString), ID1, ID2) %>% 
  group_by(ID1, ID2, add=TRUE) %>% summarise_each(funs(sum))

# Source: local data frame [2 x 6]
# Groups: Grp, ID1 [?]
# 
#     Grp   ID1   ID2  Col1  Col2  Col3
#   (chr) (chr) (chr) (int) (int) (int)
# 1    G1  A, C  B, D     4     8    13
# 2    G2  E, G  F, h     9    11    15

I think you'll want to uniqify and sort before collapsing to a string, so I've added those steps.

Upvotes: 4

Related Questions