adm
adm

Reputation: 364

How is readxl misreading an excel xlsx file and how to fix it?

I am trying to download and read an xlsx file from 2016 Kansas State faculty counts.

https://www.k-state.edu/pa/faculty/demographics/total/index.html

When the tibble is created, some numbers do not match the original excel file.

url <- "https://www.k-state.edu/pa/faculty/demographics/total/t2016.xlsx"

download.file(url, destfile = "t2016.xlsx",  mode = "wb")

library(readxl)

kansas_state <- read_xlsx("t2016.xlsx", skip = 165, col_names = FALSE)

gender <- kansas_state[1:4]
names(gender) <- c("COLLEGE", "N", "Male", "F")
gender

I am reading the very last table in the excel sheet, with totals for each college.

enter image description here

Then if you open the excel file and look at the last table "University Totals," some numbers do not match. Such as College of Agriculture, but College of Architecture does...?

enter image description here

I have also shared the github public link here for an rstudio project on this example:

https://github.com/AdamUArk/r_readxl_example

Upvotes: 0

Views: 75

Answers (1)

Vasily A
Vasily A

Reputation: 8646

There are hidden sheets in this file, so read_xlsx() reads from one of those sheets instead of the sheet you normally see. To read the one you want, add sheet = 'Totals' to the arguments of read_xlsx().

Upvotes: 2

Related Questions