Reputation: 81
I have very large csv files (2.3 GB). I only want to read certain columns that could be or could not be there.
I am using the following code that was suggested here Only read limited number of columns
library(sqldf)
loc <- read.csv.sql("data.csv",
sql = "select locID, City, CRESTA, Latitude, Longitude from file",
sep = ",")
How can I deal with the situation when for example the column "City" is not in the csv?
Upvotes: 1
Views: 354
Reputation: 269501
This finds out which columns are available, intersects their names with the names of the columns that are wanted and only reads those.
library(sqldf)
nms_wanted <- c("locID", "City", "CRESTA", "Latitude", "Longitude")
nms_avail <- names(read.csv("data.csv", nrows = 0))
nms <- intersect(nms_avail, nms_wanted)
fn$read.csv.sql("data.csv", "select `toString(nms)` from file")
Upvotes: 1