Deepak Agarwal
Deepak Agarwal

Reputation: 399

Filtering queries from data frame on date part using sqldf

I am trying to extract and join 2 data frames based on some date parts but its not working. The data frames are as follows :-

startdf

startperiod
2015-10-01
2016-10-01
2017-10-01
2018-10-01


enddf

endperiod
2016-03-31
2017-03-31
2018-03-31

Both startperiod and endperiod are of 'Date' data type

This is final output I desire :-

startperiod, endperiod
2015-10-01  2016-03-31
2016-10-01  2017-03-31
2017-10-01  2018-03-31
2018-10-01  Null

The equivalent SQL would be something like this :-

Select startperiod, endperiod
From startdf a lef join enddf b
On year(b.endperiod) = (year(a.startperiod) + 1)

is there a way to do in R? I believe I need to use library sqldf and RH2 but I couldn't get it going no matter what I did.

Simplistically, this should work but doesn't!

sqldf("Select * from startperioddf a where year(startperiod) = 2016")

Upvotes: 0

Views: 362

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269491

1) RH2 Assuming

  • the data shown in reproducible form in the Note below. In particular, note that startdate and enddate are assumed to be of Date class.
  • typos in the question are fixed
  • use of h2 database backend instead of the default sqlite

then your code works:

library(sqldf)
library(RH2)

sql <- "Select startperiod, endperiod
  From startdf a left join enddf b
  On year(b.endperiod) = (year(a.startperiod) + 1)"
sqldf(sql)

giving:

  startperiod  endperiod
1  2015-10-01 2016-03-31
2  2016-10-01 2017-03-31
3  2017-10-01 2018-03-31
4  2018-10-01       <NA>

Also

sqldf("Select * from startdf a where year(startperiod) = 2016")

giving:

  startperiod
1  2016-10-01

Be sure to read the material on the sqldf github site: https://github.com/ggrothendieck/sqldf

2) sqlite If you want to use the default sqlite backend then be sure that RH2 is NOT loaded (otherwise, it will assume you want to use it) and note that Date class variables will be uploaded to sqlite as integers representing the number of days since the unix epoch (since there is no Date class type in sqlite) so we need to convert days since the epoch to years (which can be done using strftime as shown).

sql2 <- "Select startperiod, endperiod
  From startdf a left join enddf b
  On strftime('%Y', b.endperiod * 3600 * 24, 'unixepoch') + 0 = 
     strftime('%Y', a.startperiod * 3600 * 24, 'unixepoch') + 1"
sqldf(sql2)

sqldf("Select * from startdf a 
  where strftime('%Y', a.startperiod * 3600 * 24, 'unixepoch') = '2016'")

Note

Lines1 <- "
startperiod
2015-10-01
2016-10-01
2017-10-01
2018-10-01"

Lines2 <- "
endperiod
2016-03-31
2017-03-31
2018-03-31"

startdf <- read.table(text = Lines1, header = TRUE, colClasses = "Date")
enddf <- read.table(text = Lines2, header = TRUE, colClasses = "Date")

Upvotes: 1

Siddharth Gosalia
Siddharth Gosalia

Reputation: 311

The sqldf package in R uses the SQLite database engine by default. Hence, you cannot use the year function in your query to extract the year part from the date. The following query will do the job:

sqldf("Select * from startdf where strftime('%Y', startperiod) = '2016'")

It uses SQLite's strftime function to compare specific date parts. The year function is defined under MySQL so you may have to install the RMySQL package and then use the drv = 'MySQL' argument to specify the database engine that you want sqldf to use.

Upvotes: 0

Related Questions