Christina Sobotzki
Christina Sobotzki

Reputation: 49

Compare values in sequential rows in a longitudinal dataset

I have a longitudinal dataset with mistakes in a date variable. Here is an example:

ID 1 has as first date in the first row 2013-07-17. The difference to study begin (2012-08-29) is 321 days. In the next row the visit date is 2013-02-15 and the difference to study begin (2012-08-29) is 169 days. Therefore there must be an error with the date of the visit 2013-07-17 because the visits are in ascending order.

I tried:

dat$DifferenceDateerror <- "no"

i <- 1
for(i in 1:nrow(dat)){
  if(dat[i,"DifferenceDate"] > dat[i+1,"DifferenceDate"] & !is.na(dat$DifferenceDate)[i])
  {dat$DifferenceDateerror[i]=="yes"}
}

but got the following error:

error in if (dat[i, "DifferenceDate"] > dat[i + 1, : missing value, where TRUE/FALSE is needed

I would like to find out where the Date must be wrong.

Upvotes: 3

Views: 71

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389012

Since you want to add "yes"/"no" values where the current date is greater than next date, we can use diff to compare consecutive dates and assign values accordingly.

df$DifferenceDateerror <- c("no", "yes")[c(FALSE, diff(dat$DifferenceDate) < 0)+ 1] 

Or similarly with head and tail

df$DifferenceDateerror <- c("no", "yes")[c(FALSE, head(x, -1) > tail(x, -1)) + 1]

Upvotes: 1

Related Questions