Drew Steen
Drew Steen

Reputation: 16607

Calculate elapsed "times", where the reference time depends on a factor

I'm trying to calculate elapsed times in a data frame, where the 'start' value for the elapsed time depends on the value of a factor column in the data frame. (To simply the question, I'll treat the time values as numeric rather than time objects - my question is about split-apply-combine, not time objects). My data frame looks like this:

df <- data.frame(id=gl(2, 3, 5, labels=c("a", "b")), time=1:5)

I'd like to calculate elapsed times by subtracting the minimum time in each factor level from each time (although for the sake of this example I'll just deal with numeric values, not time values). So I'd like to split the data frame by id, subtract the minimum y value from each element in the y column, and return a vector (or data frame) with the transformed values. I want to end up with something like:

> dfTrans
id  time  elapsed
a      1        0
a      2        1
a      3        2
b      4        0
b      5        1   

Seems like a perfect task for plyr, but I can't find a simple solution.

The best I can come up with is

elapsed <- dlply(df, .(id), function(x) x$time - min(x$time))
elapsed_comb <- NA
for(i in 1:length(names(elapsed))) {
  elapsed_comb <- c(elapsed_comb, elapsed[[i]])
}
elapsed_comb <- elapsed_comb[-1]
df$elapsed <- elapsed_comb

This is inelegant, and seems fragile. Surely there's a better way?

Upvotes: 0

Views: 278

Answers (2)

IRTFM
IRTFM

Reputation: 263372

The 'ave' function is the first thing you should think of when the results is to be a vector with the same length as the number of rows in the dataframe:

 df$elapsed <- ave(df$time, df$id, FUN=function(x) x -min(x) )
 df
  id time elapsed
1  a    1       0
2  a    2       1
3  a    3       2
4  b    4       0
5  b    5       1

Upvotes: 3

Dason
Dason

Reputation: 61953

Here is a ddply solution

ddply(df, .(id), summarize, time = time, elapsed = seq(length(id))-1)

and one using rle instead

df$elapsed <- unlist(sapply(rle(as.numeric(df$id))$lengths, seq))-1

Upvotes: 2

Related Questions