Reputation: 635
I am trying to read a big csv file. Indeed, I want select a subset using a special column which name is Race Color. Reading the file via read.csv, I have the head
library(sqldf)
df <- read.csv(file = 'df.txt', header = T, sep = ";")
head(df)
id Region Race Color ....
1 1 1
2 1 1
3 2 1
4 3 2
5 4 1
6 4 1
I would like to use read.csv.sql for selecting a subset of df without use the read.csv file. For example, I want all the people with Race Color equal to 1. Using read.csv.sql, I have something like
>df <- read.csv.sql("df.txt", sql = "select * from file where Race Color = 1", sep=";", header=T, eol="\n")
but I have the following error
Error in sqliteSendQuery(con, statement, bind.data) :
error in statement: near "Color": syntax error
Trying
>df <- read.csv.sql("df.txt", sql = "select * from file where 'Race Color' = 1", sep=";", header=T, eol="\n")
I have df with zero rows.
Any solution?
Upvotes: 1
Views: 1095
Reputation: 24945
R automatically adds a .
to column names with a space on reading in the data to make Race.Color
, but a .
has a special meaning in sql, so that will screw things up.
There is a built in method in sqldf
using square brackets ([Race.Color]
) to explicitly name columns we can use so that we don't run into that problem. You can also use escaped quotes : \"Race.Color\"
This should work:
library(sqldf)
read.csv.sql("test.csv", sql = "select * from file where [Race.Color] = 1", sep=";", header=T, eol="\n")
Upvotes: 3