Reputation: 2534
I am working with time series data and want to calculate the difference between the first and final measurement times, and put these numbers into a new and simpler dataframe. For example, for this dataframe
structure(list(time = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), indv = c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L), value = c(1L, 3L, 5L, 8L, 3L, 4L,
7L, 8L)), .Names = c("time", "indv", "value"), class = "data.frame", row.names = c(NA,
-8L))
or
time indv value
1 1 1
2 1 3
3 1 5
4 1 8
1 2 3
2 2 4
3 2 7
4 2 8
I can use this code
ddply(test, .(indv), transform, value_change = (value[length(value)] - value[1]), time_change = (time[length(time)] - time[1]))
to give
time indv value value_change time_change
1 1 1 7 3
2 1 3 7 3
3 1 5 7 3
4 1 8 7 3
1 2 3 5 3
2 2 4 5 3
3 2 7 5 3
4 2 8 5 3
However, I would like to eliminate the redundant rows and make a new and simpler dataframe like this
indv time_change value_change
1 3 7
2 3 5
Does anyone have any clever way to do this?
Thanks!
Upvotes: 2
Views: 313
Reputation: 89057
Just replace transform
with summarize
. You can also make your code a little prettier by using head
and tail
:
ddply(test, .(indv), summarize,
value_change = tail(value, 1) - head(value, 1),
time_change = tail(time, 1) - head(time, 1))
For maximum readability, write a function:
change <- function(x) tail(x, 1) - head(x, 1)
ddply(test, .(indv), summarize, value_change = change(value),
time_change = change(time))
Upvotes: 2