Reputation: 16607
I'm trying to calculate elapsed times in a data frame, where the 'start' value for the elapsed time depends on the value of a factor column in the data frame. (To simply the question, I'll treat the time values as numeric rather than time objects - my question is about split-apply-combine, not time objects). My data frame looks like this:
df <- data.frame(id=gl(2, 3, 5, labels=c("a", "b")), time=1:5)
I'd like to calculate elapsed times by subtracting the minimum time in each factor level from each time (although for the sake of this example I'll just deal with numeric values, not time values). So I'd like to split the data frame by id
, subtract the minimum y
value from each element in the y
column, and return a vector (or data frame) with the transformed values. I want to end up with something like:
> dfTrans
id time elapsed
a 1 0
a 2 1
a 3 2
b 4 0
b 5 1
Seems like a perfect task for plyr, but I can't find a simple solution.
The best I can come up with is
elapsed <- dlply(df, .(id), function(x) x$time - min(x$time))
elapsed_comb <- NA
for(i in 1:length(names(elapsed))) {
elapsed_comb <- c(elapsed_comb, elapsed[[i]])
}
elapsed_comb <- elapsed_comb[-1]
df$elapsed <- elapsed_comb
This is inelegant, and seems fragile. Surely there's a better way?
Upvotes: 0
Views: 278
Reputation: 263372
The 'ave' function is the first thing you should think of when the results is to be a vector with the same length as the number of rows in the dataframe:
df$elapsed <- ave(df$time, df$id, FUN=function(x) x -min(x) )
df
id time elapsed
1 a 1 0
2 a 2 1
3 a 3 2
4 b 4 0
5 b 5 1
Upvotes: 3
Reputation: 61953
Here is a ddply solution
ddply(df, .(id), summarize, time = time, elapsed = seq(length(id))-1)
and one using rle instead
df$elapsed <- unlist(sapply(rle(as.numeric(df$id))$lengths, seq))-1
Upvotes: 2