horaceT
horaceT

Reputation: 651

Unexpected behavior with POSIXct datetimes under diff

When diff() is applied to POSIXct datetimes, one gets unexpected result. The unit of the differences is not always the same.

On hourly increment POSIXct datetimes, diff() works as expected. If the hours are continuous, diff gives you the difference in hour, as seen below.

beg = ISOdatetime(2016, 11, 6, 1, 0 ,0, tz="Americ/Los_Angeles")
end = ISOdatetime(2016, 11, 7, 23, 0 ,0, tz="Americ/Los_Angeles")
dte = seq(from=beg, to=end, by="hour")
del = diff(dte)
table(del)
del
  1 
 46 

If there are gaps, the result is still in hour, which makes sense.

dte = dte[-4]
del = diff(dte)
table(del)
 del
 1  2 
44  1

Now, here is the interesting behavior.

dte1 = sort(c(dte, dte[10]))
del = diff(dte1)
table(del)
del
 0 3600 7200 
 1   44    1 

Here I added a duplicate hour, and all of the sudden, the diff unit is now in second.

Is this a bug?

Upvotes: 1

Views: 114

Answers (2)

alistaire
alistaire

Reputation: 43334

If you read the source for diff.POSIXt, it contains the code

r <- r[i1] - r[-length(r):-(length(r) - lag + 1L)]

where r is the POSIXct sequence and i1 is defined by

i1 <- -seq_len(lag)

which if the lag parameter is the default of 1, will just be -1. Thus, diff(dte1) is equivalent to

dte1[-1L] - dte1[-length(dte1):-(length(dte1) - 1L + 1L)]

which you can simplify to

dte1[-1L] - dte1[-length(dte1)]

If you look at ?difftime, you see that

Subtraction of date-time objects gives an object of this class, by calling difftime with units = "auto".

Calling difftime with units = "auto" determines units by

If units = "auto", a suitable set of units is chosen, the largest possible (excluding "weeks") in which all the absolute differences are greater than one.

which can vary. If you want particular units, you can reconstruct the operation with difftime directly:

difftime(dte1[-1], dte1[-length(dte1)], units = 'hours')

## Time differences in hours
##  [1] 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [47] 1 1

Upvotes: 0

IRTFM
IRTFM

Reputation: 263342

There is a units<- function for difftime objects:

> units(del) <- 'hours'
> table(del)
del
 0  1 
 1 46 

The ?difftime help page says:

If units = "auto", a suitable set of units is chosen, the largest possible (excluding "weeks") in which all the absolute differences are greater than one.

So perhaps the logic of the function got sidetracked by the 0 value in your case and the units got set to seconds.

Upvotes: 2

Related Questions