TheGoat
TheGoat

Reputation: 2867

Specify the number of columns read_csv is applied to

Is it possible to pass column indices to read_csv?

I am passing many CSV files to read_csv with different header names so rather than specifying names I wish to use column indices.

Is this possible?

df.list <- lapply(myExcelCSV, read_csv, skip = headers2skip[i]-1)

Upvotes: 4

Views: 799

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226232

Alternatively, you can use a compact string representation where each character represents one column: c = character, i = integer, n = number, d = double, l = logical, f = factor, D = date, T = date time, t = time, ? = guess, or ‘_’/‘-’ to skip the column.

If you know the total number of columns in the file you could do it like this:

my_read <- function(..., tot_cols, skip_cols=numeric(0)) {
   csr <- rep("?",tot_cols)
   csr[skip_cols]  <- "_"
   csr <- paste(csr,collapse="")
   read_csv(...,col_types=csr)
}

If you don't know the total number of columns in advance you could add code to this function to read just the first line of the file and count the number of columns returned ...

FWIW the skip argument might not do what you think it does (it skips rows rather than selecting/deselecting columns): as I read ?readr::read_csv() there doesn't seem to be any convenient way to skip and/or include particular columns (by name or by index) except by some ad hoc mechanism such as suggested above; this might be worth a feature request/discussion on the readr issues list? (e.g. add cols_include and/or cols_exclude arguments that could be specified by name or position?)

Upvotes: 6

Related Questions