cts
cts

Reputation: 1820

filter dplyr sqlite3 connection using regex

I want to select rows from an sqlite3 connection through dplyr using a regular expression, unfortunately this doesn't seem possible. Is there a work around to filtering these rows based on a regex? Code below shows the error.

library(nycflights13)
my_db <- src_sqlite("my_db.sqlite3", create = T)
flights_sqlite <- copy_to(my_db, flights, temporary = FALSE, indexes = list(c("year", "month", "day"), "carrier", "tailnum"))
flights_sqlite <- tbl(nycflights13_sqlite(), "flights")
filter(flights_sqlite, grepl("N9.*", tailnum))

#> Error in sqliteSendQuery(con, statement, bind.data) :
#> error in statement: no such function: GREPL

Upvotes: 2

Views: 341

Answers (1)

jeremycg
jeremycg

Reputation: 24945

Unfortunately, dplyr can't translate many useful functions into the sql query it passes on to sqlite. You can see the list of functions it can in the dplyr database vignette:

dplyr knows how to convert the following R functions to SQL:

  • basic math operators: +, -, *, /, %%, ^
  • math functions: abs, acos, acosh, asin, asinh, atan, atan2, atanh, ceiling, cos, cosh, cot, coth, exp, floor, log, log10, round, sign, sin, sinh, sqrt, tan, tanh
  • logical comparisons: <, <=, !=, >=, >, ==, %in%
  • boolean operations: &, &&, |, ||, !, xor
  • basic aggregations: mean, sum, min, max, sd, var

However, dplyr will leave alone anything it can't translate and pass it to sqlite. This is where your error comes from - dplyr can't translate grepl, and so passes it to sqlite which throws the error.

If you know a little sql, you can write your own equivalent query, using %like%:

filter(flights_sqlite, tailnum %like% "N9%")

Source: sqlite 3.8.6 
From: flights [30,216 x 16]
Filter: tailnum %like% "N9%" 

    year month   day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin  dest air_time
   (int) (int) (int)    (int)     (dbl)    (int)     (dbl)   (chr)   (chr)  (int)  (chr) (chr)    (dbl)
1   2013     1     1      602        -8      812        -8      DL  N971DL   1919    LGA   MSP      170
2   2013     1     1      608         8      807        32      MQ  N9EAMQ   3768    EWR   ORD      139
3   2013     1     1      655        -5     1002       -18      DL  N997DL   2003    LGA   MIA      161
4   2013     1     1      659        -6      907        -6      DL  N998DL    831    LGA   DTW      105
5   2013     1     1      717        -3      850        10      FL  N978AT    850    LGA   MKE      134
6   2013     1     1      754        -5     1039        -2      DL  N935DL   2047    LGA   ATL      126
7   2013     1     1      759        -1     1057       -30      DL  N955DL   1843    JFK   MIA      158
8   2013     1     1      804        -6     1103       -13      DL  N947DL   1959    JFK   MCO      147
9   2013     1     1      810         0     1048        11      9E  N915XJ   3538    JFK   MSP      189
10  2013     1     1      814         4     1047        17      FL  N977AT    346    LGA   ATL      132
..   ...   ...   ...      ...       ...      ...       ...     ...     ...    ...    ...   ...      ...

Upvotes: 4

Related Questions