Reputation: 1316
I'm summarising a SQLite database table through R, however when I use COUNT() it is converting my character field to numeric, i.e. if I have observations 0, 1, 2, n
I'm getting 0, 1, 2, 0
back. I've tried the query in SQLite Manager for Firefox, which works as expected; so I expect the issue is to do with RSQLite.
He's a working example of the problem:
install.packages("RSQLite")
library(RSQLite)
# Connect
db = dbConnect(SQLite(), "~/Desktop/test.sqlite")
# Make a table
dbSendQuery(db, "
CREATE TABLE data
(Year INT,
Station TXT,
Obs TXT)
")
# Generate dummy data
x = data.frame(Year=rep(1995:2004, 100),
Station=rep(c("Bob", "Ringo", "Jarrett", "Heron"), 250),
Obs=rep(c(0:3, "n"), 200))
# Write dummy data to db
dbWriteTable(db, "data", x, append=T, row.names=F)
# Get summary of data back
y = dbGetQuery(db, "
SELECT Year, Station, Obs, COUNT(Obs) AS num FROM data
GROUP BY Year, Station, Obs
")
I'd like to know:
I'm using: R 3.2.2, RStudio 0.99.489, Ubuntu 14.04, RSQLite 1.0.0 & DBI 0.3.1
Upvotes: 3
Views: 299
Reputation: 269526
The reason that happened is that the Obs
column was declared to be of type TXT
which is not a builtin type. If it had been declared as type TEXT
it would have worked as expected.
In SQLite the type of a column is only a hint and each row of that column can still have a different type. Since TXT
is not a known type that is in fact what happens. In the example below the txt
and txt2
columns are declared to have type TXT
in the CREATE
statement and TXT
is not a builtin type in SQLite (TEXT
is). As we see in the y
output below the typeof(txt)
column showing the type of each row of the txt
column does have different types for different rows. Ditto for the txt2
column as shown in the typeof(txt2)
column. When such data is read back into R the first entry is used by R as the type thus txt
having a first entry of 0
with ijnteger type is read into R as integer and txt2
having a first entry of n
with text type is read into R as character. This can be seen by looking at the str(y)
output below. We also note that the text
column is read into R as character since text
is a known datatype in SQLite.
library(RSQLite)
db <- dbConnect(SQLite(), ":memory:")
dbSendQuery(db, "CREATE TABLE d (txt TXT, txt2 TXT, text TEXT)")
x <- data.frame(txt = c(0, "n"), txt2 = c("n", 0), text2 = c(0, "n"))
dbWriteTable(db, "d", x, append = TRUE, row.names = FALSE)
y <- dbGetQuery(db,
"select txt, typeof(txt), txt2, typeof(txt2), text, typeof(text) from d")
y
## txt typeof(txt) txt2 typeof(txt2) text typeof(text)
## 1 0 integer n text 0 text
## 2 0 text 0 integer n text
str(y)
## 'data.frame': 2 obs. of 6 variables:
## $ txt : int 0 0 <------ becomes int since 1st entry of txt is int
## $ typeof(txt) : chr "integer" "text"
## $ txt2 : chr "n" "0" <---- chr since 1st entry is text
## $ typeof(txt2): chr "text" "integer"
## $ text : chr "0" "n" <--- chr since column declared as text in create
## $ typeof(text): chr "text" "text"
For more info see the sqlite documentation on datatypes.
Upvotes: 4
Reputation: 44525
The issue is your CREATE TABLE
command. It should be TEXT
not TXT
:
dbSendQuery(db, "
CREATE TABLE data
(Year INT,
Station TEXT,
Obs TEXT)
")
Upvotes: 1