J.P. Le Cavalier
J.P. Le Cavalier

Reputation: 1345

Interval class in data.table

My question will be explained throughout the following reproducible example.

First, let's load the required packages and create a POSIXct and a data.table object.

library(data.table)
library(lubridate)

target_date <- ymd(20180601, tz='America/Montreal')

test <- data.table(
  V1 = seq(1:3),
  V2 = c(ymd(20170421, tz='America/Montreal'),
         ymd(20170702, tz='America/Montreal'),
         ymd(20180113, tz='America/Montreal'))
)

As we can see below, everything worked fine.

test[]

##     V1         V2
##  1:  1 2017-04-21
##  2:  2 2017-07-02
##  3:  3 2018-01-13

Then, I create a new column which contains Interval objects.

test[, V3:=interval(V2, target_date)]

Still there, everything is alright.

test[]

##     V1         V2                             V3
##  1:  1 2017-04-21 2017-04-21 EDT--2018-06-01 EDT
##  2:  2 2017-07-02 2017-07-02 EDT--2018-06-01 EDT
##  3:  3 2018-01-13 2018-01-13 EST--2018-06-01 EDT

Then, I want to create two new columns which will only be defined on the first two rows: the first one is a function of the POSIXct column and the second one is a function of the Interval column.

test[c(1, 3), V4 := V2 + days(20)]
test[]

##     V1         V2                             V3         V4
##  1:  1 2017-04-21 2017-04-21 EDT--2018-06-01 EDT 2017-05-11
##  2:  2 2017-07-02 2017-07-02 EDT--2018-06-01 EDT       <NA>
##  3:  3 2018-01-13 2018-01-13 EST--2018-06-01 EDT 2018-02-02

Still correct at this point.

Then, when I try the same thing with a column function of the Interval column, I get the following warning

test[c(1, 3), V5 := ymd(20180101, tz='America/Montreal') %within% V3]

##  Warning messages:
##  1: In as.numeric(a) - as.numeric(b@start) <= [email protected] :
##    longer object length is not a multiple of shorter object length
##  2: In `[.data.table`(test, c(1, 3), `:=`(V5, ymd(20180101, tz = "America/Montreal") %within%  :
##  Supplied 3 items to be assigned to 2 items of column 'V5' (1 unused)

This warning tells me that, in fact, the command produced 3 values and that I try to input those in a 2 rows data.table (the filtered data.table has two rows).

I tried to figure out what the problem was and I think I found a hint. Consider the two following commands:

test[, V3][2]
##  [1] 2017-07-02 EDT--2018-06-01 EDT

test[2, V3]
##  [1] 2017-04-21 EDT--2018-03-21 EDT 2017-07-02 EDT--2018-06-01 EDT 2018-01-13 EST--2018-12-13 EST

In fact, I thought those two commands would have produce the same result, which is not the case. Even more surprising, they both have a length of 1 (when looking at the output above, I thought the second command would have a length of 3).

length(test[, V3][2])
##  [1] 1

length(test[2, V3])
##  [1] 1

The thing is, these two commands produce an object with a vector of 1 in the .Data slot (the length of the Interval in seconds I guess),

test[, V3][2]@.Data
##  [1] 28857600

test[2, V3]@.Data
##  [1] 28857600

but the first has a vector of 1 in the start slot while the second has a vector of 3 in the start slot.

test[, V3][2]@start
##  [1] "2017-07-02 EDT"

test[2, V3]@start
##  [1] "2017-04-21 EDT" "2017-07-02 EDT" "2018-01-13 EST"

I know I could probably fix this problem by encapsulating each Interval objects in a list and extract them every time I need, but is there another way to deal with this?

Upvotes: 5

Views: 395

Answers (1)

Matt
Matt

Reputation: 588

Might be a bit filthy, but could you just go:

test[c(1, 3), V5 := ymd(20180101, tz='America/Montreal') %within% V3[.I]]

test[, dat := (V3[.I]@.Data)]

Upvotes: 2

Related Questions