rowsum based on groupings or conditions in r

Question

I want to do rowsum in r based on column names.

I have more than 50 columns and have looked at various solutions, including this.

However, this doesn't really answer my question. I have column names such as: total_2012Q1, total_2012Q2, total_2012Q3, total_2012Q4 ,..., up to total_2014Q4, and other character variables. I want to add rows by year, so in the end, I would have three year columns: total_2012, total_2013, total_2014.

I don't want to rowsum and select something like ..sample[,2:5]. Is there a way I can sum them without manually going through column numbers? Also, split.default is an option but if there are character variables as well, how do you deal only the int variables you want to sum up?

simple reproducible example (pre):

id total_2012Q1 total_2012Q2 total_2013Q1 total_2013Q2 char1 char2
 1         1231         5455         1534         2436    N     Y
 2         3948         1239          223          994    Y     N

reproducible example (post):

id total_2012 total_2013 char1 char2
 1       6686      3970     N     Y
 2       5187      1217     Y     N

Thanks for any suggestions.

Sotos · Accepted Answer

You can use split.default, i.e.

sapply(split.default(df, sub('^.*_([0-9]+)Q[0-9]', '\1', names(df))), rowSums)
#     2012 2013
#[1,]    3   23
#[2,]    7   37
#[3,]    9   49

DATA:

dput(df)
structure(list(total_2012Q1 = c(1, 2, 3), total_2012Q2 = c(2, 
5, 6), total_2013Q1 = c(12, 15, 16), total_2013Q2 = c(11, 22, 
33)), class = "data.frame", row.names = c(NA, -3L))

rowsum based on groupings or conditions in r

Answers (2)

Related Questions