janf
janf

Reputation: 81

Read certain columns if exist using read.csv.sql from sqldf

I have very large csv files (2.3 GB). I only want to read certain columns that could be or could not be there.

I am using the following code that was suggested here Only read limited number of columns

library(sqldf) 
loc <- read.csv.sql("data.csv",
                    sql = "select locID, City, CRESTA, Latitude, Longitude from file",
                    sep = ",")

How can I deal with the situation when for example the column "City" is not in the csv?

Upvotes: 1

Views: 354

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269501

This finds out which columns are available, intersects their names with the names of the columns that are wanted and only reads those.

library(sqldf)

nms_wanted <- c("locID", "City", "CRESTA", "Latitude", "Longitude")
nms_avail <- names(read.csv("data.csv", nrows = 0))
nms <- intersect(nms_avail, nms_wanted)
fn$read.csv.sql("data.csv", "select `toString(nms)` from file")

Upvotes: 1

Related Questions