Specify the number of columns read_csv is applied to

Question

Is it possible to pass column indices to read_csv?

I am passing many CSV files to read_csv with different header names so rather than specifying names I wish to use column indices.

Is this possible?

df.list <- lapply(myExcelCSV, read_csv, skip = headers2skip[i]-1)

Ben Bolker · Accepted Answer

Alternatively, you can use a compact string representation where each character represents one column: c = character, i = integer, n = number, d = double, l = logical, f = factor, D = date, T = date time, t = time, ? = guess, or ‘_’/‘-’ to skip the column.

If you know the total number of columns in the file you could do it like this:

my_read <- function(..., tot_cols, skip_cols=numeric(0)) {
   csr <- rep("?",tot_cols)
   csr[skip_cols]  <- "_"
   csr <- paste(csr,collapse="")
   read_csv(...,col_types=csr)
}

If you don't know the total number of columns in advance you could add code to this function to read just the first line of the file and count the number of columns returned ...

FWIW the skip argument might not do what you think it does (it skips rows rather than selecting/deselecting columns): as I read ?readr::read_csv() there doesn't seem to be any convenient way to skip and/or include particular columns (by name or by index) except by some ad hoc mechanism such as suggested above; this might be worth a feature request/discussion on the readr issues list? (e.g. add cols_include and/or cols_exclude arguments that could be specified by name or position?)

Specify the number of columns read_csv is applied to

Answers (1)

Related Questions