Jazzmine
Jazzmine

Reputation: 1875

R sqldf not being selective in date range criteria

I am trying to select rows with a date value less than a value. It doesn't seem to be working as I am getting all date values, not just those less than a value.

Here's the df structure:

str(sawdf)
'data.frame':   83597 obs. of  10 variables:
 $ actiondate       : Date, format: "2016-05-08" "2016-05-08" "2016-05-09" ...

And here's some sample data:

head(sawdf)
  actiondate 
2016-05-14
2016-05-15  
2016-05-16 
2016-05-17 
2016-05-18

And here is my sql:

sqldf("select distinct actiondate from sawdf where actiondate < '2016-05-18'")

And here's some of the results:

...
6  2016-05-13
7  2016-05-14
8  2016-05-15
9  2016-05-16
10 2016-05-17
11 2016-05-18
12 2016-05-19

As you can see data beyond 2016-05-18 are being selected.

I've tried several approaches but am getting the same results.

Thanks

Upvotes: 0

Views: 1201

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269491

1) sqlite Assuming you are using the default SQLite backend, SQLite does not have a date type so the dates are transferred to SQLite as the number of days since the UNIX Epoch. That is on the SQLite side actiondate is a column of numbers. (If x were a "Date" class R variable then as.numeric(x) gives the number(s) that is/are transferred to SQLite.) We need to compare these numbers to an appropriate number, not to a character string. This would work as it also converts the comparison date in the same way (i.e. it replaces $date0 with 16939 which is the number of days since the UNIX Epoch represented by that date):

library(sqldf)

date0 <- as.Date("2016-05-18")
fn$sqldf("select distinct actiondate from sawdf where actiondate < $date0")

There is more information on date processing in sqldf with SQLite on the sqldf home page on github: https://github.com/ggrothendieck/sqldf

1a) This would also work since all dates get transferred in the same way:

library(sqldf)

Date0 <- data.frame(date0 = as.Date("2016-05-18"))
sqldf("select distinct actiondate from sawdf where actiondate < (select date0 from Date0)")

1b) Although it is a bit messy, rather than convert the comparison date to numeric one could convert the actiondate column to character using an SQLite builtin function:

sqldf("select distinct actiondate from sawdf 
       where strftime('%Y-%m-%d', actiondate * 3600 * 24, 'unixepoch') < '2016-05-18'")

2) H2 Alternately use the H2 backend which does have a date type. In that case the code in the question does work. Install RH2 (which includes H2) and also make sure you have java installed on your machine. Then:

library(RH2)
library(sqldf)
sqldf("select distinct actiondate from sawdf where actiondate < '2016-05-18'")

Note: The input we assumed, in reproducible form, is:

Lines <- "actiondate
2016-05-14
2016-05-15  
2016-05-16 
2016-05-17 
2016-05-18"
sawdf <- read.csv(text = Lines)
sawdf$actiondate <- as.Date(sawdf$actiondate)

Upvotes: 1

MVWyck
MVWyck

Reputation: 156

I can't comment yet, but @Gregor has a great solution. If you are bound and determined to use SQL though, you can first convert the date into a character (since SQLite doesn't have a date type):

sawdf <- data.frame(actiondate = as.Date(c("2016-05-14", "2016-05-15", "2016-05-30")))
sawdf$actiondate <- as.character(sawdf$actiondate)
str(sawdf)

sqldf("select actionDate 
  from sawdf where substr(actionDate,1,4)||substr(actionDate,6,2)||substr(actionDate,9,2) < '20160520'")

  actiondate
1 2016-05-14
2 2016-05-15

Upvotes: 1

Related Questions