Reputation: 167

Finding the vectorized way to perform for-loops with calculation between rows

I'm trying to find a vectorized procedure that can replace the following code (which takes a long time to run):

for (i in 2:nrow(z)) {
  if (z$customerID[i]==z$customerID[i-1]) 
     {z$timeDelta[i]<-(z$time[i]-z$time[i-1])} else {z$timeDelta[i]<- NA}
}

I tried looking for different apply snippets, but haven't found anything useful.

Here's some sample data:

customerID    time
    1         2013-04-17 15:30:00 IDT
    1         2013-05-19 11:32:00 IDT
    1         2013-05-20 10:14:00 IDT
    2         2013-03-14 18:41:00 IST
    2         2013-04-24 09:52:00 IDT
    2         2013-04-24 17:08:00 IDT

And I want to get the following output:

customerID    time                        timeDelta*
    1         2013-04-17 15:30:00 IDT     NA
    1         2013-05-19 11:32:00 IDT     31.83 
    1         2013-05-20 10:14:00 IDT     0.94 
    2         2013-03-14 18:41:00 IST     NA
    2         2013-04-24 09:52:00 IDT     40.59
    2         2013-04-24 17:08:00 IDT     0.3 

 *I prefer the time will be in days

Upvotes: 4

Answers (4)

jdharrison

Reputation: 30425

z$timeDelta <- NA
z$timeDelta[-1] <- ifelse(tail(z$customerID,-1) == head(z$customerID,-1), diff(z$time)/24, NA)

or a shorter version

z$timeDelta <- NA
z$timeDelta[-1] <- ifelse(!diff(z$customerID), diff(z$time)/24, NA)

Upvotes: 10

Wojciech Sobala

Reputation: 7561

With some help of firstobs from package doBy:

z$timeDelta <- c(NA, diff(z$time))
z$timeDelta[firstobs(z$customerID)] <- NA

Upvotes: 1

Thomas

Reputation: 44525

This should work for you:

do.call(rbind,lapply(split(mydf,mydf$customerID), function(df)
    within(df,timeDelta<-c(NA,diff(time)/24))))

Result:

    customerID                time  timeDelta
1.1          1 2013-04-17 15:30:00         NA
1.2          1 2013-05-19 11:32:00 31.8347222
1.3          1 2013-05-20 10:14:00  0.9458333
2.4          2 2013-03-14 18:41:00         NA
2.5          2 2013-04-24 09:52:00 40.5909722
2.6          2 2013-04-24 17:08:00  0.3027778

Upvotes: 2

Tyler Rinker

Reputation: 109874

This works:

## z <- read.table(text="customerID    time
##     1         2013-04-17.15:30:00.IDT
##     1         2013-05-19.11:32:00.IDT
##     1         2013-05-20.10:14:00.IDT
##     2         2013-03-14.18:41:00.IST
##     2         2013-04-24.09:52:00.IDT
##     2         2013-04-24.17:08:00.IDT", header=TRUE)
## 
## mydf$time <- z$time <- as.POSIXlt(gsub("\\.", " ", z$time))


do.call(rbind, lapply(split(z, z$customerID), function(x) {
    x$timeDelta <- c(NA, round(as.numeric(diff(x$time), units = "days"), 2))
    x
}))

##     customerID                time timeDelta
## 1.1          1 2013-04-17 15:30:00        NA
## 1.2          1 2013-05-19 11:32:00     31.83
## 1.3          1 2013-05-20 10:14:00      0.95
## 2.4          2 2013-03-14 18:41:00        NA
## 2.5          2 2013-04-24 09:52:00     40.63
## 2.6          2 2013-04-24 17:08:00      0.30

Upvotes: 2

Finding the vectorized way to perform for-loops with calculation between rows

Answers (4)

Related Questions