Rlearner
Rlearner

Reputation: 21

R- Please help. Having trouble writing for loop to lag date

I am attempting to write a for loop which will take subsets of a dataframe by person id and then lag the EXAMDATE variable by one for comparison. So a given row will have the original EXAMDATE and also a variable EXAMDATE_LAG which will contain the value of the EXAMDATE one row before it.

for (i in length(uniquerid))
{
    temp <- subset(part2test, RID==uniquerid[i])
    temp$EXAMDATE_LAG <- temp$EXAMDATE
    temp2 <- data.frame(lag(temp, -1, na.pad=TRUE))  
    temp3 <- data.frame(cbind(temp,temp2))
}

It seems that I am creating the new variable just fine but I know that the lag won't work properly because I am missing steps. Perhaps I have also misunderstood other peoples' examples on how to use the lag function?

Upvotes: 2

Views: 701

Answers (1)

Justin
Justin

Reputation: 43255

So that this can be fully answered. There are a handful of things wrong with your code. Lucaino has pointed one out. Each time through your loop you are going to create temp, temp2, and temp3 (or overwrite the old one). and thus you'll be left with only the output of the last time through the loop.

However, this isnt something that needs a loop. Instead you can make use of the vectorized nature of R

x <- 1:10

> c(x[-1], NA)
 [1]  2  3  4  5  6  7  8  9 10 NA

So if you combine that notion with a library like plyr that splits data nicely you should have a workable solution. If I've missed something or this doesn't solve your problem, please provide a reproducible example.

library(plyr)
myLag <- function(x) {
  c(x[-1], NA)
}

ddply(part2test, .(uniquerid), transform, EXAMDATE_LAG=myLag(EXAMDATE))

You could also do this in base R using split or the data.table package using its by= argument.

Upvotes: 1

Related Questions