Reputation: 432
I'm trying to calculate cumulative sum in rows with several variables.
This is my data as example. I have 5 patients ID, and 4 condition variables. If there is value between '1 to 3' in conditions, cumsum will be added 1.
ID<-c("a","b","c","d","e")
cond1<-as.factor(sample(x=1:7,size=5,replace=TRUE))
cond2<-as.factor(sample(x=1:7,size=5,replace=TRUE))
cond3<-as.factor(sample(x=1:7,size=5,replace=TRUE))
cond4<-as.factor(sample(x=1:7,size=5,replace=TRUE))
df<-data.frame(ID,cond1,cond2,cond3,cond4)
df
ID cond1 cond2 cond3 cond4
1 a 2 7 6 6
2 b 7 2 3 6
3 c 4 3 1 4
4 d 7 3 3 6
5 e 6 7 7 3
I use rowSums
code with following statement. However, as 2nd row, though cond2
is 2 and cond3
is 3, the cumsum
was not '2', '1'. 4nd row has same problem.
df$cumsum<-rowSums(df[,2:5]==c(1,2,3),na.rm=TRUE)
df
ID cond1 cond2 cond3 cond4 cumsum
1 a 2 7 6 6 0
2 b 7 2 3 6 1
3 c 4 3 1 4 1
4 d 7 3 3 6 1
5 e 6 7 7 3 0
How to make it cumulative? I would really appreciate all your help.
Upvotes: 1
Views: 1210
Reputation: 6496
I suggest you fix two problems with your data:
That said, I propose a data.table
solution:
# 1. load libraries and make df a data.table:
library(data.table)
setDT(df)
# 2. make the wide table a long one
melt(df, id.vars = "ID")
# 3. with a long table, count the number of conditions that are in the 1:3 range for each ID. Notice I chained the first command with this second one:
melt(df, id.vars = "ID")[, sum(value %in% 1:3), by = ID]
Which produces the result:
ID V1
1: a 1
2: b 2
3: c 2
4: d 2
5: e 1
You'll only need to run commands under 1 and 3 (2 has been chained into 3). See ?data.table
for further details.
You can read more about wide vs long in wikipedia and in Mike Wise's answer
The data I used is the same as @akrun:
df <- structure(list(ID = c("a", "b", "c", "d", "e"),
cond1 = c(2L, 7L, 4L, 7L, 6L),
cond2 = c(7L, 2L, 3L, 3L, 7L),
cond3 = c(6L, 3L, 1L, 3L, 7L),
cond4 = c(6L, 6L, 4L, 6L, 3L)),
class = "data.frame",
row.names = c("1", "2", "3", "4", "5"))
Upvotes: 0
Reputation: 887541
For more than 1 element comparison, use %in%
, but %in%
works on a vector
. So, we loop through the columns with lapply/sapply
and then do the rowSums
on the logical matrix
df$RSum <- rowSums(sapply(df[,2:5], `%in%`, 1:3))
df$RSum
#[1] 1 2 2 2 1
If the values were numeric, then we could also make use of >
or <
df$RSum <- rowSums(df[, 2:5] >=1 & df[, 2:5] <=3)
df <- structure(list(ID = c("a", "b", "c", "d", "e"), cond1 = c(2L,
7L, 4L, 7L, 6L), cond2 = c(7L, 2L, 3L, 3L, 7L), cond3 = c(6L,
3L, 1L, 3L, 7L), cond4 = c(6L, 6L, 4L, 6L, 3L)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))
Upvotes: 1