sum up rows based on row.names and condition in col.names -- R

Question

df <- data.frame(row.names = c('1s.u1','1s.u2','2s.u1','2s.u2','6s.u1'),fjri_deu_klcea= c('0','0','0','15','23'),hfue_klcea=c('2','2','0','156','45'),dji_dhi_ghcea_jk=c('456','0','0','15','15'),jdi_jdi_ghcea=c('1','2','3','4','100'),gz7_jfu_dcea_jdi=c('5','6','3','7','56'))

df
      fjri_deu_klcea hfue_klcea dji_dhi_ghcea_jk jdi_jdi_ghcea gz7_jfu_dcea_jdi
1s.u1              0          2              456             1                5
1s.u2              0          2                0             2                6
2s.u1              0          0                0             3                3
2s.u2             15        156               15             4                7
6s.u1             23         45               15           100               56

I want to sum up df based on the cea part of the column names. So all rows with the same cea part should sum up. df should look like this

        klcea      ghcea            dcea
1s.u1      2         457               5
1s.u2      2          2                6
2s.u1      0          3                3
2s.u2      171        19               7
6s.u1      68         115              56

I thought about firstly getting a new column with the cea name called cea and then summing it up based on row.names and the respective cea with something like with(df, ave(cea, row.names(df), FUN = sum))

I do not know how to generate the new column based on a pattern in a string. I guess grepl is useful but I could not come up with something, I tried df$cea <- df[grepl(colnames(df),'cea'),] which is wrong...

Ronak Shah · Accepted Answer

Using base R, you can extract the "cea" part from the name and use it in split.default to split dataframe into columns, we can then use rowSums to sum each individual dataframe.

sapply(split.default(df, sub('.*_(.*cea).*', '\1', names(df))), rowSums)

#      dcea ghcea klcea
#1s.u1    5   457     2
#1s.u2    6     2     2
#2s.u1    3     3     0
#2s.u2    7    19   171
#6s.u1   56   115    68

where sub part returns :

sub('.*_(.*cea).*', '\1', names(df))
#[1] "klcea" "klcea" "ghcea" "ghcea" "dcea"

sum up rows based on row.names and condition in col.names -- R

Answers (2)

Related Questions