jennifer.cl
jennifer.cl

Reputation: 225

selecting every Nth column in using SQLDF or read.csv.sql

I am rather new to using SQL statements, and am having a little trouble using them to select the desired columns from a large table and pulling them into R.

I want to take a csv file and read selected columns into r, in particular, every 9th and 10th column. In R, something like:

read.csv.sql("myfile.csv", sql(select * from file [EVERY 9th and 10th COLUMN])

My trawl of the internet suggests that selecting every nth row could be done with an SQL statement using MOD something like this (please correct me if I am wrong):

"SELECT *
        FROM   file
        WHERE  (ROWID,0) IN (SELECT ROWID, MOD(ROWNUM,9) OR MOD(ROWNUM,10)"

Is there a way to make this work for columns? Thanks in advance.

Upvotes: 1

Views: 1409

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269491

read.csv read.csv would be adequate for this:

# determine number of columns
DF1 <- read.csv(myfile, nrows = 1)
nc <- ncol(DF1)

# create a list nc long where unwanted columns are NULL and wanted are NA
colClasses <- rep(rep(list("NULL", NA), c(8, 2)), length = nc)

# read in
DF <- read.csv(myfile, colClasses = colClasses)

sqldf To use sqldf replace the last line with these:

nms <- names(DF1)
vars <- toString(nms[is.na(colClasses)])
DF <- fn$read.csv.sql(myfile, "select $vars from file")

UPDATE: switched to read.csv.sql

UPDATE 2: correction.

Upvotes: 2

Related Questions