AnnK
AnnK

Reputation: 189

Sum rows at specific column intervals

I have a large data frame of 1129 rows and 4662 columns. I want to sum the row values in a data frame at intervals of every 3 columns, and then return 1 for each of these sums if the row sum every 3 columns was >0, or return 0 if the sum<1. I have added a small reproducible example below. I would like to sum the row values of column 1 to column 3, and then the row values from column 4 to column 8 (and so on in my real data).

df <- read.table(text ="     2005-09-23_2005-09-26  2005-09-27_2005-10-30  2005-10-07_2005-10-08  2005-10-09_2005-10-10  2005-10-11_2005-10-12  2005-10-13_2005-10-14
1  1       0     1     1     1     1           
2  1       1     0     0     0     0     
3  NA      NA    NA     NA     NA     0", header = TRUE)

The result I am after would be this:

result <- read.table(text ="     2005-09-23_2005-10-08  2005-10-09_2005-10-14
1  1       1           
2  1       0     
3  NA      0", header = TRUE)

I looked for similar questions and it seems that rollapply (R: summing over an interval of rows) OR rowsum could work (R: summing over an interval of rows), but I can't find a way to sum rows using columns as intervals instead of rows, nor how to do it in a repetitive sequence. Would someone be so kind to help me with some code for doing this? Thank you very much!

Upvotes: 1

Views: 268

Answers (1)

Daniel O
Daniel O

Reputation: 4358

This works only if the number of columns is divisible by the interval.

+(sapply(split.default(df,unlist(lapply(1:(ncol(df)/3),rep,3))),rowSums) > 0)
   1  2
1  1  1
2  1  0
3 NA NA

maybe someone else can find a more elegant way of creating the split other than
unlist(lapply(1:(ncol(df)/3),rep,3))

Upvotes: 1

Related Questions