MAOC
MAOC

Reputation: 635

in R, read special columns with read.csv.sql

I am trying to read a big csv file. Indeed, I want select a subset using a special column which name is Race Color. Reading the file via read.csv, I have the head

library(sqldf)
df <-  read.csv(file = 'df.txt', header = T, sep = ";")
head(df)

id  Region  Race Color  .... 
 1       1          1
 2       1          1
 3       2          1
 4       3          2
 5       4          1
 6       4          1

I would like to use read.csv.sql for selecting a subset of df without use the read.csv file. For example, I want all the people with Race Color equal to 1. Using read.csv.sql, I have something like

>df <- read.csv.sql("df.txt", sql = "select * from file where Race Color = 1", sep=";", header=T, eol="\n")

but I have the following error

Error in sqliteSendQuery(con, statement, bind.data) : 
error in statement: near "Color": syntax error

Trying

>df <- read.csv.sql("df.txt", sql = "select * from file where 'Race Color' = 1", sep=";", header=T, eol="\n")

I have df with zero rows.

Any solution?

Upvotes: 1

Views: 1095

Answers (1)

jeremycg
jeremycg

Reputation: 24945

R automatically adds a . to column names with a space on reading in the data to make Race.Color, but a . has a special meaning in sql, so that will screw things up.

There is a built in method in sqldf using square brackets ([Race.Color]) to explicitly name columns we can use so that we don't run into that problem. You can also use escaped quotes : \"Race.Color\"

This should work:

library(sqldf)

read.csv.sql("test.csv", sql = "select * from file where [Race.Color] = 1", sep=";", header=T, eol="\n")

Upvotes: 3

Related Questions