Reputation: 1720
I am trying to extract information from an sqlite table using dplyr
.
Norway <- tbl(conn, "own_fleet") %>% (mmsi==235060247) %>% filter(timestamp>='2018-08-16T00:00:01') %>% collect()
This results in an error:
Error in eval(rhs, env, env) : object 'mmsi' not found
However mmsi
exists in the table and is the first column which I confirmed by running PRAGMA table_info(own_fleet)
.
sqlite> PRAGMA table_info(own_fleet);
0|mmsi|INTEGER|0||0
1|lat|REAL|0||0
2|lon|REAL|0||0
3|rateOfTurn|INTEGER|0||0
4|sogKts|REAL|0||0
5|cog|REAL|0||0
6|heading|REAL|0||0
7|timestamp|TEXT|0||0
8|imoNumber|INTEGER|0||0
9|dimensionToBow|INTEGER|0||0
10|dimensionToStern|INTEGER|0||0
11|dimensionToPort|INTEGER|0||0
12|dimensionToStarboard|INTEGER|0||0
13|etaMonth|INTEGER|0||0
14|etaDay|INTEGER|0||0
15|etaHour|INTEGER|0||0
16|etaMinute|INTEGER|0||0
17|draught|INTEGER|0||0
18|name|TEXT|0||0
19|destination|TEXT|0||0
20|callsign|TEXT|0||0
Also timestamp
is stored as a character variable in the table. Is there any way of converting it into a date format using say ymd_hms()
from lubridate
inside a filter()
from dplyr
?
Upvotes: 1
Views: 238
Reputation: 1370
I had similar error message(s) when trying to mutate()
and filter()
a (connection to a) database. From these examples I got around the issue by using dplyr::collect()
before the offending filter()
and/or mutate()
.
Make some sample data and write it to a new database:
library(tidyverse)
library(lubridate)
library(RSQLite)
library(dbplyr)
# make a sample table, with dates
tibble(
datetime = seq(ymd_hms('2020-03-11 12:00:00'), ymd_hms('2020-03-30 12:00:00'), by = '5 days'),
id = 1:4
) %>%
mutate_all(as.character) %>%
{. ->> my_data}
my_data
# # A tibble: 4 x 2
# datetime id
# <dttm> <int>
# 1 2020-03-11 12:00:00 1
# 2 2020-03-16 12:00:00 2
# 3 2020-03-21 12:00:00 3
# 4 2020-03-26 12:00:00 4
# make database
conn <- DBI::dbConnect(RSQLite::SQLite(), 'my_database.sqlite')
# add data to the database
DBI::dbWriteTable(conn, 'my_data', my_data, overwrite = TRUE)
Retrieve the table from the database; note I saved the datetimes in character format to prevent them from being converted to a numeric format when saving into the database:
# retrieve database
my_database <- tbl(conn, 'my_data')
my_database
# # Source: table<my_data> [?? x 2]
# # Database: sqlite 3.37.0 [\\Ucstaff\dfs\Associated Organisations\Institute for Applied
# # Ecology\Personnel\ALLAN\Masters\r\projects\shiny apps\cotter acoustic project\cotter acoustic project_1-5\my_database.sqlite]
# datetime id
# <chr> <chr>
# 1 2020-03-11 12:00:00 1
# 2 2020-03-16 12:00:00 2
# 3 2020-03-21 12:00:00 3
# 4 2020-03-26 12:00:00 4
Then, if we try and mutate()
the datetime
column using ymd_hms()
, we get an error.
my_database %>%
mutate(
datetime = ymd_hms(datetime)
)
# Error: no such function: ymd_hms
The error suggests that R can't find the ymd_hms()
function from lubridate
, so I tried explicitly calling it:
my_database %>%
mutate(
datetime = lubridate::ymd_hms(datetime)
)
# Error in lapply(list(...), .num_to_date) : object 'datetime' not found
This error suggests the column doesn't exist, which is strange.
Similarly, using filter()
returns unexpected results than when filtering a regular data.frame
or tibble
. Normally I find that you can specify a number as a filter for a character column without an issue (I assume R must convert one or the other to find matches), but this doesn't seem to work with the database connection.
my_database %>%
filter(
id == 2
)
# Source: lazy query [?? x 2]
# Database: sqlite 3.37.0 [\\Ucstaff\dfs\Associated Organisations\Institute for Applied
# Ecology\Personnel\ALLAN\Masters\r\projects\shiny apps\cotter acoustic project\cotter acoustic project_1-5\my_database.sqlite]
# ... with 2 variables: datetime <chr>, id <chr>
## no results
my_database %>%
filter(
id == '2'
)
# # Source: lazy query [?? x 2]
# # Database: sqlite 3.37.0 [\\Ucstaff\dfs\Associated Organisations\Institute for Applied
# # Ecology\Personnel\ALLAN\Masters\r\projects\shiny apps\cotter acoustic project\cotter acoustic project_1-5\my_database.sqlite]
# datetime id
# <chr> <chr>
# 1 2020-03-16 12:00:00 2
So, we can get around these issues by using collect()
before filter()
or mutate()
, which retrieves the data to a local tibble
. I guess this then means you're working locally rather than through a database, but it's one way around it.
my_database %>%
collect %>%
filter(
id == 2
)
# # A tibble: 1 x 2
# datetime id
# <chr> <chr>
# 1 2020-03-16 12:00:00 2
my_database %>%
collect %>%
mutate(datetime = ymd_hms(datetime))
# # A tibble: 4 x 2
# datetime id
# <dttm> <chr>
# 1 2020-03-11 12:00:00 1
# 2 2020-03-16 12:00:00 2
# 3 2020-03-21 12:00:00 3
# 4 2020-03-26 12:00:00 4
Upvotes: 1