ESKim
ESKim

Reputation: 432

rowsums with multiple conditions

I'm trying to calculate cumulative sum in rows with several variables.

This is my data as example. I have 5 patients ID, and 4 condition variables. If there is value between '1 to 3' in conditions, cumsum will be added 1.

ID<-c("a","b","c","d","e")
cond1<-as.factor(sample(x=1:7,size=5,replace=TRUE))
cond2<-as.factor(sample(x=1:7,size=5,replace=TRUE))
cond3<-as.factor(sample(x=1:7,size=5,replace=TRUE))
cond4<-as.factor(sample(x=1:7,size=5,replace=TRUE))
df<-data.frame(ID,cond1,cond2,cond3,cond4)
df

  ID cond1 cond2 cond3 cond4
1  a     2     7     6     6
2  b     7     2     3     6
3  c     4     3     1     4
4  d     7     3     3     6
5  e     6     7     7     3

I use rowSums code with following statement. However, as 2nd row, though cond2 is 2 and cond3 is 3, the cumsum was not '2', '1'. 4nd row has same problem.

df$cumsum<-rowSums(df[,2:5]==c(1,2,3),na.rm=TRUE)
df
  ID cond1 cond2 cond3 cond4 cumsum
1  a     2     7     6     6      0
2  b     7     2     3     6      1
3  c     4     3     1     4      1
4  d     7     3     3     6      1
5  e     6     7     7     3      0

How to make it cumulative? I would really appreciate all your help.

Upvotes: 1

Views: 1210

Answers (2)

PavoDive
PavoDive

Reputation: 6496

I suggest you fix two problems with your data:

  1. Your data is wide, instead of long formatted. Had your data been long formatted, your analysis would have been much simpler. This is specially true for plotting.
  2. Your values for each condition are factors. That makes it more difficult to do comparissons, and might induce some difficult-to-spot errors. If you see @akrun answer carefully, you'll notice the values are integer (numeric).

That said, I propose a data.table solution:

# 1. load libraries and make df a data.table:
library(data.table)
setDT(df)

# 2. make the wide table a long one
melt(df, id.vars = "ID")

# 3. with a long table, count the number of conditions that are in the 1:3 range for each ID. Notice I chained the first command with this second one:
melt(df, id.vars = "ID")[, sum(value %in% 1:3), by = ID]

Which produces the result:

   ID V1
1:  a  1
2:  b  2
3:  c  2
4:  d  2
5:  e  1

You'll only need to run commands under 1 and 3 (2 has been chained into 3). See ?data.table for further details.

You can read more about wide vs long in wikipedia and in Mike Wise's answer

The data I used is the same as @akrun:

df <- structure(list(ID = c("a", "b", "c", "d", "e"),
                          cond1 = c(2L, 7L, 4L, 7L, 6L), 
                          cond2 = c(7L, 2L, 3L, 3L, 7L), 
                          cond3 = c(6L, 3L, 1L, 3L, 7L), 
                          cond4 = c(6L, 6L, 4L, 6L, 3L)), 
               class = "data.frame", 
               row.names = c("1", "2", "3", "4", "5"))

Upvotes: 0

akrun
akrun

Reputation: 887541

For more than 1 element comparison, use %in%, but %in% works on a vector. So, we loop through the columns with lapply/sapply and then do the rowSums on the logical matrix

df$RSum <- rowSums(sapply(df[,2:5], `%in%`, 1:3))
df$RSum
#[1] 1 2 2 2 1

If the values were numeric, then we could also make use of > or <

df$RSum <- rowSums(df[, 2:5] >=1 & df[, 2:5] <=3)

data

df <- structure(list(ID = c("a", "b", "c", "d", "e"), cond1 = c(2L, 
7L, 4L, 7L, 6L), cond2 = c(7L, 2L, 3L, 3L, 7L), cond3 = c(6L, 
3L, 1L, 3L, 7L), cond4 = c(6L, 6L, 4L, 6L, 3L)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

Upvotes: 1

Related Questions