zesla
zesla

Reputation: 11833

Read specific columns starting from certain rows from excel file using readxl package in R

I'm trying to read an excel file into R. I need to read column A and column C (no B), starting from row 5. Here is what I did:

library(readxl)

read_excel('./data/temp.xlsx',  skip=5,
            range=cell_cols(c('A', 'C')))

The code above does not work. First, it does not skip 5 rows. It reads from first row. Secondly, it also read column B, which I do not want.

Does anyone know what I did wrong? I know how to specify the cell range, but how should I pick the specific columns I need?

Upvotes: 2

Views: 5579

Answers (2)

Alex Ho
Alex Ho

Reputation: 438

Borrowing the content from readxl.tidyverse.org. One of your questions regarding why column B is also added is because:

## columns only
read_excel(..., range = cell_cols(1:26))
## is equivalent to all of these
read_excel(..., range = cell_cols(c(1, 26)))
read_excel(..., range = cell_cols("A:Z"))
read_excel(..., range = cell_cols(LETTERS))
read_excel(..., range = cell_cols(c("A", "Z"))

Hence, cell_cols("A:C") is equivalent to cell_cols(c("A", "C"))

Previously, what I did was in one of my projects was the following. I guess you can adapt the following and extract the data by column, then join them together.

ranges = list("A5:H18", "A28:H39", "A50:H61")

extracted <- lapply(ranges, function(each_range){
                read_excel(filepath, sheet = 1, range = each_range, na = c("", "-"), col_names = cname, col_types = ctype)
        }) %>%
                reduce(full_join) 

Regarding your question about skipping rows, I'm also not sure because I was also searching for this answer, and found your question on stackoverflow.

[edit] I think I found some readings on https://github.com/tidyverse/readxl/issues/577. Anyway, if you use range, you can't do any skip, as range takes precedence over skip and others

Upvotes: 0

Joris C.
Joris C.

Reputation: 6244

You can use the column_types argument (check ?read_excel) to skip columns from being read. For instance, if columns A and C are numeric:

readxl::read_excel("/path/to/data.xlsx", 
    col_names = FALSE, 
    skip = 5, 
    col_types=c("numeric", "skip", "numeric"))

NB: if the column types are unknown initially you could read them as text and convert them afterwards.

Upvotes: 4

Related Questions