Reputation: 399
I am trying to extract and join 2 data frames based on some date parts but its not working. The data frames are as follows :-
startdf
startperiod
2015-10-01
2016-10-01
2017-10-01
2018-10-01
enddf
endperiod
2016-03-31
2017-03-31
2018-03-31
Both startperiod
and endperiod
are of 'Date' data type
This is final output I desire :-
startperiod, endperiod
2015-10-01 2016-03-31
2016-10-01 2017-03-31
2017-10-01 2018-03-31
2018-10-01 Null
The equivalent SQL would be something like this :-
Select startperiod, endperiod
From startdf a lef join enddf b
On year(b.endperiod) = (year(a.startperiod) + 1)
is there a way to do in R? I believe I need to use library sqldf
and RH2
but I couldn't get it going no matter what I did.
Simplistically, this should work but doesn't!
sqldf("Select * from startperioddf a where year(startperiod) = 2016")
Upvotes: 0
Views: 362
Reputation: 269491
1) RH2 Assuming
startdate
and enddate
are assumed to be of Date
class.then your code works:
library(sqldf)
library(RH2)
sql <- "Select startperiod, endperiod
From startdf a left join enddf b
On year(b.endperiod) = (year(a.startperiod) + 1)"
sqldf(sql)
giving:
startperiod endperiod
1 2015-10-01 2016-03-31
2 2016-10-01 2017-03-31
3 2017-10-01 2018-03-31
4 2018-10-01 <NA>
Also
sqldf("Select * from startdf a where year(startperiod) = 2016")
giving:
startperiod
1 2016-10-01
Be sure to read the material on the sqldf github site: https://github.com/ggrothendieck/sqldf
2) sqlite If you want to use the default sqlite backend then be sure that RH2 is NOT loaded (otherwise, it will assume you want to use it) and note that Date class variables will be uploaded to sqlite as integers representing the number of days since the unix epoch (since there is no Date class type in sqlite) so we need to convert days since the epoch to years (which can be done using strftime
as shown).
sql2 <- "Select startperiod, endperiod
From startdf a left join enddf b
On strftime('%Y', b.endperiod * 3600 * 24, 'unixepoch') + 0 =
strftime('%Y', a.startperiod * 3600 * 24, 'unixepoch') + 1"
sqldf(sql2)
sqldf("Select * from startdf a
where strftime('%Y', a.startperiod * 3600 * 24, 'unixepoch') = '2016'")
Lines1 <- "
startperiod
2015-10-01
2016-10-01
2017-10-01
2018-10-01"
Lines2 <- "
endperiod
2016-03-31
2017-03-31
2018-03-31"
startdf <- read.table(text = Lines1, header = TRUE, colClasses = "Date")
enddf <- read.table(text = Lines2, header = TRUE, colClasses = "Date")
Upvotes: 1
Reputation: 311
The sqldf
package in R uses the SQLite
database engine by default. Hence, you cannot use the year
function in your query to extract the year part from the date. The following query will do the job:
sqldf("Select * from startdf where strftime('%Y', startperiod) = '2016'")
It uses SQLite's strftime
function to compare specific date parts. The year
function is defined under MySQL
so you may have to install the RMySQL
package and then use the drv = 'MySQL'
argument to specify the database engine that you want sqldf
to use.
Upvotes: 0