MikeRSpencer
MikeRSpencer

Reputation: 1316

R SQLite converts char to numeric when using COUNT

I'm summarising a SQLite database table through R, however when I use COUNT() it is converting my character field to numeric, i.e. if I have observations 0, 1, 2, n I'm getting 0, 1, 2, 0 back. I've tried the query in SQLite Manager for Firefox, which works as expected; so I expect the issue is to do with RSQLite.

He's a working example of the problem:

install.packages("RSQLite")
library(RSQLite)

# Connect
db = dbConnect(SQLite(), "~/Desktop/test.sqlite")

# Make a table
dbSendQuery(db, "
   CREATE TABLE data
      (Year INT,
      Station TXT,
      Obs TXT)
")

# Generate dummy data
x = data.frame(Year=rep(1995:2004, 100),
    Station=rep(c("Bob", "Ringo", "Jarrett", "Heron"), 250),
    Obs=rep(c(0:3, "n"), 200))

# Write dummy data to db
dbWriteTable(db, "data", x, append=T, row.names=F)

# Get summary of data back
y = dbGetQuery(db, "
   SELECT Year, Station, Obs, COUNT(Obs) AS num FROM data
   GROUP BY Year, Station, Obs
")

I'd like to know:

  1. Why is it doing this?
  2. How can I fix it?

I'm using: R 3.2.2, RStudio 0.99.489, Ubuntu 14.04, RSQLite 1.0.0 & DBI 0.3.1

Upvotes: 3

Views: 299

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269526

The reason that happened is that the Obs column was declared to be of type TXT which is not a builtin type. If it had been declared as type TEXT it would have worked as expected.

In SQLite the type of a column is only a hint and each row of that column can still have a different type. Since TXT is not a known type that is in fact what happens. In the example below the txt and txt2 columns are declared to have type TXT in the CREATE statement and TXT is not a builtin type in SQLite (TEXT is). As we see in the y output below the typeof(txt) column showing the type of each row of the txt column does have different types for different rows. Ditto for the txt2 column as shown in the typeof(txt2) column. When such data is read back into R the first entry is used by R as the type thus txt having a first entry of 0 with ijnteger type is read into R as integer and txt2 having a first entry of n with text type is read into R as character. This can be seen by looking at the str(y) output below. We also note that the text column is read into R as character since text is a known datatype in SQLite.

library(RSQLite)
db <- dbConnect(SQLite(), ":memory:")

dbSendQuery(db, "CREATE TABLE d (txt TXT, txt2 TXT, text TEXT)")
x <- data.frame(txt = c(0, "n"), txt2 = c("n", 0), text2 = c(0, "n")) 
dbWriteTable(db, "d", x, append = TRUE, row.names = FALSE)

y <- dbGetQuery(db, 
         "select txt, typeof(txt), txt2, typeof(txt2), text, typeof(text) from d")

y
##  txt typeof(txt) txt2 typeof(txt2) text typeof(text)
## 1   0     integer    n         text    0         text
## 2   0        text    0      integer    n         text

str(y)
## 'data.frame':   2 obs. of  6 variables:
## $ txt         : int  0 0   <------ becomes int since 1st entry of txt is int
## $ typeof(txt) : chr  "integer" "text"
## $ txt2        : chr  "n" "0"  <---- chr since 1st entry is text
## $ typeof(txt2): chr  "text" "integer"
## $ text        : chr  "0" "n"  <--- chr since column declared as text in create
## $ typeof(text): chr  "text" "text"

For more info see the sqlite documentation on datatypes.

Upvotes: 4

Thomas
Thomas

Reputation: 44525

The issue is your CREATE TABLE command. It should be TEXT not TXT:

dbSendQuery(db, "
   CREATE TABLE data
      (Year INT,
      Station TEXT,
      Obs TEXT)
")

Upvotes: 1

Related Questions