Reputation: 34703
I'm using read_excel
for speed and simplicity to import an Excel file.
Unfortunately, there's as yet no capability of excluding selected columns which won't be needed from the data set; to save effort, I am naming such columns "x"
with the col_names
argument, which is easier than trying to keep track of x1
, x2
, and so on.
I'd then like to exclude such columns on the fly if possible to avoid an extra step of copying, so in pseudocode:
read_excel("data.xlsx", col_names = c("x", "keep", "x"))[ , !"x"]
We can use the sample data set included with the readxl
package for illustration:
library(readxl)
DF <- read_excel(system.file("extdata/datasets.xlsx", package = "readxl"),
col_names = c("x", "x", "length", "width", "x"), skip = 1L)
The approaches I've seen that work don't exactly work on the fly, e.g., having stored DF
, we can now do:
DF <- DF[ , -grep("^x$", names(DF))]
This works but requires making a copy of DF
by storing it, then overwriting it; I'd prefer to remove the columns in the same command as read_excel
to allocate DF
properly ab initio.
Other similar approaches require declaring temporary variables, which I prefer to avoid if possible, e.g.,
col_names <- c("x", "x", "length", "width", "x")
DF <- read_excel(system.file("extdata/datasets.xlsx", package = "readxl"),
col_names = col_names, skip = 1L)[ , -grep("^x$", col_names)]
Is there a way to axe these columns without creating unnecessary temporary variables?
(I could convert to data.table
, but am wondering if there's a way to do so without data.table
)
Upvotes: 0
Views: 265
Reputation: 43334
There actually is a way to do this in readxl::read_excel
, though it's a little hidden, and I have no idea if the columns are read into memory [temporarily] regardless. The trick is to specify column types, putting in "blank"
for those you don't want:
readxl::read_excel(system.file("extdata/datasets.xlsx", package = "readxl"),
col_types = c('blank', 'blank', 'numeric', 'numeric', 'text'))
## # A tibble: 150 x 3
## Petal.Length Petal.Width Species
## <dbl> <dbl> <chr>
## 1 1.4 0.2 setosa
## 2 1.4 0.2 setosa
## 3 1.3 0.2 setosa
## 4 1.5 0.2 setosa
## 5 1.4 0.2 setosa
## 6 1.7 0.4 setosa
## 7 1.4 0.3 setosa
## 8 1.5 0.2 setosa
## 9 1.4 0.2 setosa
## 10 1.5 0.1 setosa
## # ... with 140 more rows
The caveat is that you need to know all the data types of the columns you want, though I suppose you could always just start with text and clean up later with type.convert
or whatnot.
Upvotes: 2
Reputation: 35242
I don't see an easy way to avoid copying. But a one liner is achievable using piping, needing no temporary variables. E.g.:
library(magrittr)
read_excel(system.file("extdata/datasets.xlsx", package = "readxl"),
col_names = c("x", "x", "length", "width", "x"), skip = 1L) %>%
extract(, -grep("^x$", names(.))) ->
DF
Upvotes: 2