Reputation: 59
df <- data.frame(row.names = c('1s.u1','1s.u2','2s.u1','2s.u2','6s.u1'),fjri_deu_klcea= c('0','0','0','15','23'),hfue_klcea=c('2','2','0','156','45'),dji_dhi_ghcea_jk=c('456','0','0','15','15'),jdi_jdi_ghcea=c('1','2','3','4','100'),gz7_jfu_dcea_jdi=c('5','6','3','7','56'))
df
fjri_deu_klcea hfue_klcea dji_dhi_ghcea_jk jdi_jdi_ghcea gz7_jfu_dcea_jdi
1s.u1 0 2 456 1 5
1s.u2 0 2 0 2 6
2s.u1 0 0 0 3 3
2s.u2 15 156 15 4 7
6s.u1 23 45 15 100 56
I want to sum up df
based on the cea
part of the column names. So all rows with the same cea
part should sum up.
df
should look like this
klcea ghcea dcea
1s.u1 2 457 5
1s.u2 2 2 6
2s.u1 0 3 3
2s.u2 171 19 7
6s.u1 68 115 56
I thought about firstly getting a new column with the cea
name called cea
and then summing it up based on row.names
and the respective cea
with something like with(df, ave(cea, row.names(df), FUN = sum))
I do not know how to generate the new column based on a pattern in a string. I guess grepl
is useful but I could not come up with something, I tried df$cea <- df[grepl(colnames(df),'cea'),]
which is wrong...
Upvotes: 0
Views: 784
Reputation: 11584
Using dplyr:
> df %>% rowwise() %>% mutate(klcea = sum(c_across(ends_with('klcea'))),
+ ghcea = sum(c_across(contains('ghcea'))),
+ dcea = sum(c_across(contains('dcea')))) %>%
+ select(klcea, ghcea, dcea)
# A tibble: 5 x 3
# Rowwise:
klcea ghcea dcea
<dbl> <dbl> <dbl>
1 2 457 5
2 2 2 6
3 0 3 3
4 171 19 7
5 68 115 56
If you wish to retain row names:
> df %>% rownames_to_column('rn') %>% rowwise() %>% mutate(klcea = sum(c_across(ends_with('klcea'))),
+ ghcea = sum(c_across(contains('ghcea'))),
+ dcea = sum(c_across(contains('dcea')))) %>%
+ select(klcea, ghcea, dcea, rn) %>% column_to_rownames('rn')
klcea ghcea dcea
1s.u1 2 457 5
1s.u2 2 2 6
2s.u1 0 3 3
2s.u2 171 19 7
6s.u1 68 115 56
>
Upvotes: 1
Reputation: 388972
Using base R, you can extract the "cea" part from the name and use it in split.default
to split dataframe into columns, we can then use rowSums
to sum each individual dataframe.
sapply(split.default(df, sub('.*_(.*cea).*', '\\1', names(df))), rowSums)
# dcea ghcea klcea
#1s.u1 5 457 2
#1s.u2 6 2 2
#2s.u1 3 3 0
#2s.u2 7 19 171
#6s.u1 56 115 68
where sub
part returns :
sub('.*_(.*cea).*', '\\1', names(df))
#[1] "klcea" "klcea" "ghcea" "ghcea" "dcea"
Upvotes: 1