Ina
Ina

Reputation: 4470

`data.table` error: "reorder received irregular lengthed list" in setkey

I have a fairly basic data.table in R, with 250k rows and 90 columns. I am trying to key the data.table on one of the columns which is of class character. When I call:

setkey(my.dt,my.column)

I receive the following cryptic error message:

"Error in setkeyv(x, cols, verbose=verbose) :
reorder received irregular lengthed list"

I have found a source-code commit with this message, but can't quite decipher what it means. My key column contains no NA or blank values, seems perfectly reasonable to look at (it contains stock tickers), and behaves well with the default order() command.

Even more frustrating, the following code completes correctly:

first.dt <- my.dt[1:100000]
setkey(first.dt,my.column)
second.dt <- my.dt[100001:nrow(my.dt]
setkey(second.dt,my.column)

I have no idea what could be going on here. Any tips?

Edit 1: I have confirmed every value in the key fits a fairly standard format:

> length(grep("[A-Z]{3,4}\\.[A-Z]{2}",my.dt$my.column)) == nrow(my.dt)
[1] TRUE

Edit 2: My system info is below (note that I'm actually using Windows 7). I am using data.table version 1.8.

> Sys.info()
          sysname           release           version          nodename           machine             login 
        "Windows" "Server 2008 x64"      "build 7600" "WIN-9RH28AH0CKG"          "x86-64"   "Administrator" 
             user    effective_user 
  "Administrator"   "Administrator" 

Upvotes: 1

Views: 278

Answers (1)

Matt Dowle
Matt Dowle

Reputation: 59612

Please run :

sapply(my.dt, length)

I suspect that one or more columns have a different length to the first column, and that's an invalid data.table. It won't be one of the first 5 because your .Internal(inspect(my.dt)) (thanks) shows those and they're ok.

If so, there is this bug fix in v1.8.1 :

o rbind() of DT with an irregular list() now recycles the list items correctly, #2003. Test added.

Any chance there's an rbind() at an earlier point to create my.dt together with an irregular lengthed list? If not, please step through your code running the sapply(my.dt,length) to see where the invalidly lengthed column is being created. Armed with that we can make a work around and also fix the potential bug. Thanks.

EDIT :

The original cryptic error message is now improved in v1.8.1, as follows :

DT = list(a=6:1,b=4:1)
setattr(DT,"class",c("data.table","data.frame"))
setkey(DT,a)

Error in setkeyv(x, cols, verbose = verbose) : 
  Column 2 is length 4 which differs from length of column 1 (6). Invalid
  data.table. Check NEWS link at top of ?data.table for latest bug fixes. If
  not already reported and fixed, please report to datatable-help.

NB: This method to create a data.table is not recommended because it lets you create an invalid data.table. Unless, you are really sure the list is regular and you really do need speed (i.e. for speed you want to avoid the checks that as.data.table() and data.table() do), or you need to demonstrate an invalid data.table, as I'm doing here.

Upvotes: 1

Related Questions