Reputation: 1720

How to extract using dplyr filter on first column of table in sqlite DB

I am trying to extract information from an sqlite table using dplyr.

Norway <- tbl(conn, "own_fleet") %>% (mmsi==235060247) %>% filter(timestamp>='2018-08-16T00:00:01') %>% collect()

This results in an error:

Error in eval(rhs, env, env) : object 'mmsi' not found

However mmsi exists in the table and is the first column which I confirmed by running PRAGMA table_info(own_fleet).

sqlite> PRAGMA table_info(own_fleet);
0|mmsi|INTEGER|0||0
1|lat|REAL|0||0
2|lon|REAL|0||0
3|rateOfTurn|INTEGER|0||0
4|sogKts|REAL|0||0
5|cog|REAL|0||0
6|heading|REAL|0||0
7|timestamp|TEXT|0||0
8|imoNumber|INTEGER|0||0
9|dimensionToBow|INTEGER|0||0
10|dimensionToStern|INTEGER|0||0
11|dimensionToPort|INTEGER|0||0
12|dimensionToStarboard|INTEGER|0||0
13|etaMonth|INTEGER|0||0
14|etaDay|INTEGER|0||0
15|etaHour|INTEGER|0||0
16|etaMinute|INTEGER|0||0
17|draught|INTEGER|0||0
18|name|TEXT|0||0
19|destination|TEXT|0||0
20|callsign|TEXT|0||0

Also timestamp is stored as a character variable in the table. Is there any way of converting it into a date format using say ymd_hms() from lubridate inside a filter() from dplyr?

Upvotes: 1

Answers (1)

hugh-allan

Reputation: 1370

I had similar error message(s) when trying to mutate() and filter() a (connection to a) database. From these examples I got around the issue by using dplyr::collect() before the offending filter() and/or mutate().

Make some sample data and write it to a new database:

library(tidyverse)
library(lubridate)
library(RSQLite)
library(dbplyr)

# make a sample table, with dates
tibble(
  datetime = seq(ymd_hms('2020-03-11 12:00:00'), ymd_hms('2020-03-30 12:00:00'), by = '5 days'), 
  id = 1:4
) %>% 
  mutate_all(as.character) %>% 
  {. ->> my_data}

my_data

# # A tibble: 4 x 2
#   datetime               id
#   <dttm>              <int>
# 1 2020-03-11 12:00:00     1
# 2 2020-03-16 12:00:00     2
# 3 2020-03-21 12:00:00     3
# 4 2020-03-26 12:00:00     4


# make database
conn <- DBI::dbConnect(RSQLite::SQLite(), 'my_database.sqlite')

# add data to the database
DBI::dbWriteTable(conn, 'my_data', my_data, overwrite = TRUE)

Retrieve the table from the database; note I saved the datetimes in character format to prevent them from being converted to a numeric format when saving into the database:

# retrieve database
my_database <- tbl(conn, 'my_data')

my_database

# # Source:   table<my_data> [?? x 2]
# # Database: sqlite 3.37.0 [\\Ucstaff\dfs\Associated Organisations\Institute for Applied
# #   Ecology\Personnel\ALLAN\Masters\r\projects\shiny apps\cotter acoustic project\cotter acoustic project_1-5\my_database.sqlite]
#   datetime            id   
#   <chr>               <chr>
# 1 2020-03-11 12:00:00 1    
# 2 2020-03-16 12:00:00 2    
# 3 2020-03-21 12:00:00 3    
# 4 2020-03-26 12:00:00 4

Then, if we try and mutate() the datetime column using ymd_hms(), we get an error.

my_database %>% 
  mutate(
    datetime = ymd_hms(datetime)
  )
# Error: no such function: ymd_hms

The error suggests that R can't find the ymd_hms() function from lubridate, so I tried explicitly calling it:

my_database %>% 
  mutate(
    datetime = lubridate::ymd_hms(datetime)
  )
# Error in lapply(list(...), .num_to_date) : object 'datetime' not found

This error suggests the column doesn't exist, which is strange.

Similarly, using filter() returns unexpected results than when filtering a regular data.frame or tibble. Normally I find that you can specify a number as a filter for a character column without an issue (I assume R must convert one or the other to find matches), but this doesn't seem to work with the database connection.

my_database %>% 
  filter(
    id == 2
  )

# Source:   lazy query [?? x 2]
# Database: sqlite 3.37.0 [\\Ucstaff\dfs\Associated Organisations\Institute for Applied
#   Ecology\Personnel\ALLAN\Masters\r\projects\shiny apps\cotter acoustic project\cotter acoustic project_1-5\my_database.sqlite]
# ... with 2 variables: datetime <chr>, id <chr>

## no results


my_database %>% 
  filter(
    id == '2'
  )

# # Source:   lazy query [?? x 2]
# # Database: sqlite 3.37.0 [\\Ucstaff\dfs\Associated Organisations\Institute for Applied
# #   Ecology\Personnel\ALLAN\Masters\r\projects\shiny apps\cotter acoustic project\cotter acoustic project_1-5\my_database.sqlite]
#   datetime            id   
#   <chr>               <chr>
# 1 2020-03-16 12:00:00 2

So, we can get around these issues by using collect() before filter() or mutate(), which retrieves the data to a local tibble. I guess this then means you're working locally rather than through a database, but it's one way around it.

my_database %>% 
  collect %>% 
  filter(
    id == 2
  )

# # A tibble: 1 x 2
#   datetime            id   
#   <chr>               <chr>
# 1 2020-03-16 12:00:00 2  


my_database %>% 
  collect %>% 
  mutate(datetime = ymd_hms(datetime))

# # A tibble: 4 x 2
#   datetime            id   
#   <dttm>              <chr>
# 1 2020-03-11 12:00:00 1    
# 2 2020-03-16 12:00:00 2    
# 3 2020-03-21 12:00:00 3    
# 4 2020-03-26 12:00:00 4

Upvotes: 1

How to extract using dplyr filter on first column of table in sqlite DB

Answers (1)

Related Questions