R after replacing read.csv with fread incorrect number of dimensions error appears

Question

I was loading my csv file with plain:

baseData <- read.csv(datafile)

but as I want to load larger dataset I have moved to data.table package

baseData <- fread(input = paste("zcat < ", datafile, sep=""))

all seems to work fine, and the data loads much faster, but when I hit the following line:

d <- baseData[baseData$some_prop==0,]
d <- d[!is.na(d[,"col"]) & (d[,"col"] == 0 | d[,"col"] == 1),]

I get error for incorrect number of dimensions

when using read.csv all is working fine. Any idea what can get wrong ?

Tensibai · Accepted Answer

In a data.table the j part of the subsetting is meant to return a new value and the columns names should not be quoted or you'll get back exactly this value.

Example:

>d<-data.table(A=1:5,B=5:10)
> d[,A]
[1] 1 2 3 4 5 1
> d[,B]
[1]  5  6  7  8  9 10
> d[,"B"]
[1] "B"

So for you particular case, removing the quotes around the columns names should fix the error.

If your code is quite long and use data.frame methods, you can use setDF(d) to make it work as-is before refactoring it.

To be complete, the error arise because your logical statement is of length 1 ("col" == whatever does just return one value TRUE or FALSE), not matching the number of rows of your data.table object.

R after replacing read.csv with fread incorrect number of dimensions error appears

Answers (1)

Related Questions