Using data.table Subsettting for non equality

Question

I have a datatable with 400k rows and I am doing subsetting and it is very slow.

Here is an a sample data frame:

                 date   name value size car1 car2
1 2015-01-01 07:44:00    bob     1    5    A    D
2 2015-02-02 09:46:00 george   522    2    B    F

Now I subset it the slow way using subset():

main<- data.frame(date = as.POSIXct(c("2015-01-01 07:44:00","2015-02-02 09:46:00"),tz="GMT"),name= c("bob","george"),value=c(1,522), size= c(5,2), car1=c("A","B"), car2=c("D","F"))
main$date
subset(main,    size >1 
       &  value == 522
       &  name == "george" 
       &  date >= as.POSIXct("2015-01-01 03:44:00",tz="GMT") &  date >= as.POSIXct("2015-01-01 08:44:00",tz="GMT")
       &  (car1 == "F" | car2 == "F")
)

                 date   name value size car1 car2
2 2015-02-02 09:46:00 george   522    2    B    F

This works and returns 1 row but it is very slow.

Thanks to some responses on another question using data.table looks to be much faster so I would like to use data.table to do the same thing as above but I have a bunch of questions.

Here is what I so far:

   library(data.table)  
 mdt<- as.data.table(main)
 setkey(mdt, date, name, value,size,car1,car2)
  mdt[.(as.POSIXct("2015-01-01 03:44:00"),"george", 522,2,"F","F")]

This returns:

date   name value size car1 car2
1: 2015-01-01 03:44:00 george   522    2   NA    F

Here are my questions:

(1) I want to have a criteria where date >= and date <= but is this possible using data.table? If not any ideas how to make the subsetting faster?

(2) I want to have a criteria where (car1 == "F" | car2 == "F") but is this possible? If not any ideas how to make the subsetting faster?

(3) You can see the output of the mdt[] there is a date of 2015-01-01 03:44:00 but this date IS NOT in the original "main" dataframe. What is happening here?

(4) You can see in the output of the mdt[] there is a car1 value of NA when car1 is not NA in the original "main" dataframe. What is happening here?

Thank you.

Using data.table Subsettting for non equality

Answers (1)

Related Questions