RockScience
RockScience

Reputation: 18580

R: extract the latest revisionTime for each date

I have a revision data frame with 3 columns:

  1. revisionTime
  2. date
  3. value

For instance here is a sample, but mine is very very long (several hundred of thousands of rows)

df = structure(list(revisionTime = structure(c(1471417781, 1471417781, 
1471417781, 1473978576, 1473978576, 1473978576), class = c("POSIXct", 
"POSIXt"), tzone = ""), date = structure(c(1464652800, 1467244800, 
1469923200, 1456704000, 1467244800, 1472601600), class = c("POSIXct", 
"POSIXt"), tzone = ""), value = c(103.7, 104.1, 104.9, 104.414, 
104.3, 104.4)), .Names = c("revisionTime", "date", "value"), row.names = 536:541, class = "data.frame")

What I need is a very fast way to extract from this data.frame the latest revisionTime for each date (and the corresponding value). There are some similar questions, but my question is more precise: is there a way to avoid loops?

Thank you

Upvotes: 0

Views: 29

Answers (2)

Sandipan Dey
Sandipan Dey

Reputation: 23101

If your revisionTime is nicely formatted (Y-m-d H:M:S) always as in your example, you may not need to convert to Date time at all, this should simply work:

aggregate(revisionTime ~ date, df, max)

Upvotes: 0

akrun
akrun

Reputation: 887048

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'date' after converting to Date class, order the 'revisionTime' in descending order (in i) and get the first row with head.

library(data.table)
setDT(df1)[order(-revisionTime), head(.SD, 1), .(date = as.Date(date))]

Upvotes: 0

Related Questions