David Bijoyan
David Bijoyan

Reputation: 117

Why doesn't sqldf function see the condition?

I use sqldf function for data manipuation in R studio. And I met a problem that sqldf function doesn't take into account the condition I set up in my query. For example the code

d2 <- sqldf("select a.'n1.0', a.date, a.tot_cap*100/a.'n1.0' as afasf from data a where a.date < '2013-03-01'") 

gives exact the same result as code

d2 <- sqldf("select a.'n1.0', a.date, a.tot_cap*100/a.'n1.0' as afasf from data a").

Setting single quotes, double quotes, no quotes at all doesn't help.

Does anybody know what's the problem and how to resolve it?

Thanks in advance.

Here is the result of dput(data2) command where data2 is the first 40 rows of the initial dataset.

structure(list(REGN = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), date = structure(c(14761, 
14791, 14822, 14853, 14883, 14914, 14944, 14975, 15006, 15034, 
15065, 15095, 15126, 15156, 15187, 15218, 15248, 15279, 15309, 
15706, 15737, 15765, 15796, 15826, 15857, 15887, 15918, 15949, 
15979, 16010, 16040, 16071, 16102, 16130, 16161, 16191, 16222, 
16252, 16283, 16314), class = "Date"), tot_cap = structure(c(3.29680129097603e-316, 
3.27881038454837e-316, 3.31569035934127e-316, 3.42544269478722e-316, 
3.46449744773999e-316, 3.76857311386686e-316, 3.83375272419446e-316, 
3.89781752479879e-316, 3.88106524094525e-316, 3.91119504385185e-316, 
3.54278081534629e-316, 3.66904833254234e-316, 3.84160896084211e-316, 
3.83063042693901e-316, 3.87586035817945e-316, 3.98758302742101e-316, 
4.04557250040469e-316, 4.19023427922167e-316, 4.27870651560933e-316, 
5.55371435659506e-316, 5.45611929687006e-316, 5.73639651252866e-316, 
5.74625569130456e-316, 5.74576631928235e-316, 5.79667637503333e-316, 
5.86047635645149e-316, 5.66957630781748e-316, 6.16857297583704e-316, 
6.25638237375425e-316, 6.32633446059442e-316, 6.54992194176386e-316, 
6.44650520772079e-316, 6.41496840466797e-316, 6.51734725500882e-316, 
6.52890122716964e-316, 6.49014508749322e-316, 6.47399926921995e-316, 
6.49043322657787e-316, 6.47938176858544e-316, 6.65456810876004e-316
), class = "integer64"), bas_cap = structure(c(0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 5.24913731265079e-316, 5.71723269425777e-316, 
5.71723269425777e-316, 5.72079658738749e-316, 5.71723269425777e-316, 
5.71723269425777e-316, 5.71723269425777e-316, 5.84030513832873e-316
), class = "integer64"), osn_cap = structure(c(0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 5.24913731265079e-316, 5.71723269425777e-316, 
5.71723269425777e-316, 5.72079658738749e-316, 5.71723269425777e-316, 
5.71723269425777e-316, 5.71723269425777e-316, 5.84030513832873e-316
), class = "integer64"), n1.0 = c(NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 14.13, 13.91, 13.74, 13.25, 
13.11, 13.59, 13.07, 13.06), n1.1 = c(NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 11.65, 12.3, 12.11, 11.75, 
11.65, 12.04, 11.61, 11.53), n1.2 = c(NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 11.65, 12.3, 12.12, 11.77, 
11.66, 12.06, 11.62, 11.54), n1 = c(13.96, 13.66, 15.07, 15.75, 
15.49, 15.87, 15.7, 16.32, 16.17, 16.67, 14.97, 15.06, 15.22, 
15.02, 14.68, 14.42, 13.5, 13.29, 13.1, 13.25, 12.53, 13.28, 
13.35, 13.42, 13.67, 14.14, 13.1, 14.13, 14.13, 14.55, 14.53, 
14.51, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, 40L
), class = "data.frame")

Upvotes: 1

Views: 143

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269491

There are several problems here:

  • single quotes are used for quoting constants. Use double quotes to quote variable names.

  • SQLite does not support a Date type so the dates are read in as numbers (number of days since the UNIX Epoch) and then the statement is attempting to compare that number to a character string.

  • although not wrong you don't really need the alias a since there is only one data frame so there cannot be any confusion as to what is being referenced.

  • the question refers to data2 but data is used in the R code. We have changed the R code to use data2.

See the links given in the comments under the question for more information.

SQLite

Using the default SQLite backend, we preface sqldf with fn$ to cause it to run anything between backticks in R and replace that expression with the result of that run.

d2 <- fn$sqldf("select \"n1.0\", date, tot_cap * 100/ \"n1.0\" as afasf 
  from data2 
  where date < `as.Date('2013-03-01')`") 

H2

Alternately, use the H2 backend. (As with SQLite, the H2 database is included in the R driver so you don't have to install it separately but you do need to ensure that you have Java installed -- fortunately that is extremely simple as it has an automated installation procedure.) That database does have a Date type and is able to compare such dates with correclty formatted character strings representing dates.

library(RH2)  # <----------note

d3 <- fn$sqldf("select \"n1.0\", date, tot_cap * 100 / \"n1.0\" as afasf 
  from data2
  where date < '2013-03-01'") 

Be sure to detach RH2 from the search path if you want to resume using SQLite.

Upvotes: 2

Related Questions