Count number of changes in categorical variables during repeated measurements

Question

I have observed a number of subjects during 2-5 years and each year asked if they have had a specific symptom ("yes" or "no"). I want to count how many times this symptom-state/variable changed, ie number of shifts (from "no" to "yes" or from "yes" to "no") during the observation period (year 1 to year 5) within each subject. Unfortunately, I have som NAs where the subject did not answer. These NAs should be ignored.

subject<-c("a","b","c","d")
year1 <- c("no", "yes", NA, NA)
year2 <- c("yes", "yes", NA, "yes")
year3 <- c("no", "yes", "yes", NA)
year4 <- c("yes", "yes", NA, "no")
year5 <- c("yes", "yes", "yes", NA)
df = data.frame(subject, year1, year2, year3, year4, year5) 
df

How do I create the new numerical variable "df$shifts" [Number of shifts(n)]? In this example, "df$shifts" should become 3,0,0,1.

akrun · Accepted Answer

We can loop over the rows, get the rle of non-NA elements, extract the 'values', get the sum of the adjacent elements that are not equal and assign it to new column 'shifts'.

df$shifts <- apply(df[-1], 1, function(x) {x1 <- rle(x[!is.na(x)])$values
                             sum(x1[-1]!= x1[-length(x1)])})
#[1] 3 0 0 1

Count number of changes in categorical variables during repeated measurements

Answers (1)

Related Questions