Reputation: 33
I need to selectively add values across rows, from a subset of columns in my data frame. Values could take the integers 0, 1, 2, or 3. The sum of 2s and 3s (not the count of 2s and 3s, that's easy) gives the "intensity" of what we're measuring. Suppose my data frame was 60 columns long, and I wanted to look solely at the 2nd through 11th columns. The excel code to achieve this (one way) is pretty trivial: =sumif(a2:a11,">1")
. This would add all the 2s and 3s in that range.
While I've used aggregate()
in the past, most answers on this topic were not really looking to do what I was looking to do, and I wasn't able to find a way to get aggregate()
to function the way I needed. I was also baffled by trying to get rowSums()
to do this.
Eventually I realized I could do the following:
apply(data[,2:11], 1, function(x)sum(2*(x==2), 3*(x==3), na.rm=T))
This is by far the quickest and most elegant way I've found to replicate my particular desired behavior equivalent to Excel's sumif
.
If someone has an even better method, I'd be glad to hear it.
Upvotes: 1
Views: 283
Reputation: 32548
#DATA
set.seed(42)
df = data.frame(matrix(sample(0:3, 250, replace = TRUE), ncol = 50))
Here's how you'd do using rowSums()
rowSums(df[,2:11] * (df[,2:11] > 1), na.rm = TRUE)
#[1] 18 13 11 19 11
Compare with your solution
apply(df[,2:11], 1, function(x) sum(2*(x==2), 3*(x==3), na.rm=TRUE))
#[1] 18 13 11 19 11
Upvotes: 2
Reputation: 3210
A suggestion:
apply(data[,2:11], 1, function(x) sum(x[x %in% 2:3], na.rm = TRUE))
Upvotes: 1