statquant
statquant

Reputation: 14380

Is it a bug in data.table and integer64 I found

I am having a lot of difficulties with data.table and integer64 (package bit64)> My understanding is that integer64 cannot yet be used in a by clause. Though I might have found a bug in the "sort".

library(data.table)
library(bit64)

test4 <- structure(list(IDFD = c("360627720722618433", "360627720722618433"
), CDVCA = c("2013-03-13T09:36:07.795", "2013-03-13T09:36:07.795"
), NUMSEQ = structure(c(1.05397451390436e-309, 1.05397443975625e-309
), class = "integer64")), .Names = c("IDFD", "CDVCA", "NUMSEQ"
), row.names = c(NA, -2L), class = "data.frame")

str(test4)
'data.frame':   2 obs. of  3 variables:
 $ IDFD  : chr  "360627720722618433" "360627720722618433"
 $ CDVCA : chr  "2013-03-13T09:36:07.795" "2013-03-13T09:36:07.795"
 $ NUMSEQ:Class 'integer64'  num [1:2] 1.05e-309 1.05e-309

test4 <- as.data.table(test4)

str(test4)
Classes ‘data.table’ and 'data.frame':  2 obs. of  3 variables:
 $ IDFD  : chr  "360627720722618433" "360627720722618433"
 $ CDVCA : chr  "2013-03-13T09:36:07.795" "2013-03-13T09:36:07.795"
 $ NUMSEQ:Class 'integer64'  num [1:2] 1.05e-309 1.05e-309
 - attr(*, ".internal.selfref")=<externalptr> 

setkey(test4,IDFD,CDVCA,NUMSEQ)
test4
                 IDFD                   CDVCA          NUMSEQ
1: 360627720722618433 2013-03-13T09:36:07.795 213326816542720
2: 360627720722618433 2013-03-13T09:36:07.795 213326801534975 #THIS IS NOT SORTED !!

Am I right ?

Upvotes: 1

Views: 401

Answers (2)

user5099519
user5099519

Reputation:

You can get around this, without changing the field value by doing a:

df[order(as.numeric(as.character(myint64field)), myotherfield),]

Obviously, you're gonna take a performance hit.

Upvotes: 0

Arun
Arun

Reputation: 118839

Update: This is now implemented in v1.9.3 (available from R-Forge), see NEWS :

o bit64::integer64 now works in grouping and joins, #5369. Thanks to James Sams for highlighting UPCs and Clayton Stanley.
Reminder: fread() has been able to detect and read integer64 for a while.

On OP's example above:

test4
#                  IDFD                   CDVCA          NUMSEQ
# 1: 360627720722618433 2013-03-13T09:36:07.795 213326801534975 ## sorted right
# 2: 360627720722618433 2013-03-13T09:36:07.795 213326816542720

Upvotes: 2

Related Questions