How is readxl misreading an excel xlsx file and how to fix it?

Question

I am trying to download and read an xlsx file from 2016 Kansas State faculty counts.

https://www.k-state.edu/pa/faculty/demographics/total/index.html

When the tibble is created, some numbers do not match the original excel file.

url <- "https://www.k-state.edu/pa/faculty/demographics/total/t2016.xlsx"

download.file(url, destfile = "t2016.xlsx",  mode = "wb")

library(readxl)

kansas_state <- read_xlsx("t2016.xlsx", skip = 165, col_names = FALSE)

gender <- kansas_state[1:4]
names(gender) <- c("COLLEGE", "N", "Male", "F")
gender

I am reading the very last table in the excel sheet, with totals for each college.

Then if you open the excel file and look at the last table "University Totals," some numbers do not match. Such as College of Agriculture, but College of Architecture does...?

I have also shared the github public link here for an rstudio project on this example:

https://github.com/AdamUArk/r_readxl_example

Vasily A · Accepted Answer

There are hidden sheets in this file, so read_xlsx() reads from one of those sheets instead of the sheet you normally see. To read the one you want, add sheet = 'Totals' to the arguments of read_xlsx().

How is readxl misreading an excel xlsx file and how to fix it?

Answers (1)

Related Questions