Reputation: 18580
I have a revision data frame with 3 columns:
For instance here is a sample, but mine is very very long (several hundred of thousands of rows)
df = structure(list(revisionTime = structure(c(1471417781, 1471417781,
1471417781, 1473978576, 1473978576, 1473978576), class = c("POSIXct",
"POSIXt"), tzone = ""), date = structure(c(1464652800, 1467244800,
1469923200, 1456704000, 1467244800, 1472601600), class = c("POSIXct",
"POSIXt"), tzone = ""), value = c(103.7, 104.1, 104.9, 104.414,
104.3, 104.4)), .Names = c("revisionTime", "date", "value"), row.names = 536:541, class = "data.frame")
What I need is a very fast way to extract from this data.frame the latest revisionTime for each date (and the corresponding value). There are some similar questions, but my question is more precise: is there a way to avoid loops?
Thank you
Upvotes: 0
Views: 29
Reputation: 23101
If your revisionTime is nicely formatted (Y-m-d H:M:S) always as in your example, you may not need to convert to Date time at all, this should simply work:
aggregate(revisionTime ~ date, df, max)
Upvotes: 0
Reputation: 887048
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'date' after converting to Date
class, order
the 'revisionTime' in descending order (in i
) and get the first row with head
.
library(data.table)
setDT(df1)[order(-revisionTime), head(.SD, 1), .(date = as.Date(date))]
Upvotes: 0