Amitai
Amitai

Reputation: 891

Repeated rolling join without looping

What is the most efficient way to find the latest 'value' just prior to every day of 2015, grouped by for (loc.x, loc.y) pairs?

dt <- data.table( 
  loc.x = as.integer(c(1, 1, 3, 1, 3, 1)),
  loc.y = as.integer(c(1, 2, 1, 2, 1, 2)),
  time = as.IDate(c("2015-03-11", "2015-05-10", "2015-09-27",
                    "2015-12-31", "2014-09-13", "2015-08-19")), 
  value = letters[1:6]
)
setkey(dt, loc.x, loc.y, time)

required output:

   loc.x loc.y 2015-01-01  ...  2015-12-31
1:     1     1         NA                a
2:     1     2         NA                f
3:     3     1          e                c

Upvotes: 2

Views: 59

Answers (1)

David Arenburg
David Arenburg

Reputation: 92292

You could create a look up table with all dates in 2015 and unique values in loc.x and loc.y using CJ and then run a rolling join combined with dcast.

Lookup <- do.call(CJ, c(unique = TRUE,
                        as.list(dt[, .(loc.x, loc.y)]),
                        list(time = seq(as.IDate("2015-01-01"), 
                                        as.IDate("2015-12-31"), 
                                         by = "day"))))


dcast(dt[Lookup, roll = TRUE, nomatch = 0L], loc.x + loc.y ~ time, value.var = "value")

#    loc.x loc.y 2015-01-01 2015-01-02 2015-01-03 
# 1:     1     1         NA         NA         NA
# 2:     1     2         NA         NA         NA 
# 3:     3     1          e          e          e ... (truncated)

#    2015-12-26 2015-12-27 2015-12-28 2015-12-29 2015-12-30 2015-12-31
# 1:          a          a          a          a          a          a
# 2:          f          f          f          f          f          d
# 3:          c          c          c          c          c          c

Upvotes: 2

Related Questions