Reputation: 203
I have a data set with multiple rows per patient, where each row represents a 1-week period of time over the course of 4 months. There is a variable grade
that can take on values of 1
,2
,or 3
, and I want to detect when a single patient's grade INCREASES (1 to 2, 1 to 3, or 2 to 3) at any point (the result would be a yes/no variable). I could write a function to do it but I'm betting there is some clever functional programming I could do to make use of existing R functions. Here is a sample data set below. Thank you!
df=data.frame(patient=c(1,1,1,2,2,3,3,3,3),period=c(1,2,3,1,3,1,3,4,5),grade=c(1,1,1,2,3,1,1,2,3))
what I would want is a resulting data frame of:
data.frame(patient=c(1,2,3),grade.increase=c(0,1,1))
Upvotes: 1
Views: 1466
Reputation: 5335
If you feel like doing this in base R, here's a solution that uses the split-apply-combine approach.
split
to make a list with a separate data frame for each patient;lapply
to iterate a summarization function over each list element, where the summarization function uses diff
to look at changes in grade
and if
and any
to summarize; and thendo.call(rbind, ...)
to collapse the resulting list into a data frame.Here's what that looks like:
do.call(rbind, lapply(split(df, df[,"patient"]), function(i) {
data.frame(patient = i[,"patient"][1],
grade.increase = if (any(diff(i[,"grade"]) > 0)) 1 else 0 )
}))
Result:
patient grade.increase
1 1 0
2 2 1
3 3 1
Upvotes: 0
Reputation: 1177
library(dplyr)
df %>%
arrange(patient, period) %>%
mutate(grade.increase = case_when(grade > lag(grade) ~ TRUE,TRUE ~ FALSE)) %>%
group_by(patient) %>%
summarise(grade.increase = max(grade.increase))
Combining lag
which checks the previous value with case_when
allows us to identify each grade.increase.
Summarising the maximum of grade.increase for each patient gets the desired results as boolean calculations treat FALSE as 0 and TRUE as 1.
Upvotes: 4