EcologyTom
EcologyTom

Reputation: 2500

openxlsx function read.xlsx fails to correctly read dates in R

I'm trying to load a .xlsx file in R, using the openxlsx package. Unfortunately, the spreadsheet has some strange formatting in the date column, which is in the format "Month/Day/Year", e.g. 9/21/2014. (Excel recognises this as a date format). When importing with read.xlsx, the month and day are missed off, leaving only the year as a numeric column. I suspect it is something to do with the / character.

df <- read.xlsx("The File.xlsx", sheet = "Sheet 1")

head(df)
  Number       Type         Other.Type     Date
    1           902             611        2014     
    2           902             611        2014   
    3           902             611        2014    
    4           795             966        2014 
 ...

I've tried including the detectDates = TRUE argument, but that just gives NAs.

I can't edit the spreadsheet as the data belongs to someone else and I have just been given access to it. Is there an equivalent of the colClasses argument from the xlsx package, or any other way of getting the data into R?

Many thanks

Upvotes: 3

Views: 10616

Answers (2)

Mnl
Mnl

Reputation: 977

This worked for me after changing the format in excel to: date "2012-03-14"

library(openxlsx)
read.xlsx(xlsxFile = "The File.xlsx", sheet = "sheet 1" , detectDates = TRUE)

Upvotes: 4

EcologyTom
EcologyTom

Reputation: 2500

This answer is just for completeness, in case anyone else ends up here with a similar problem. All thanks due to @StéphaneLaurent, who provided the suggestion in the comments.

Switching to the readxl package resolved the problems. Note be sure to check the the help file for the read_xlsx call; particularly for the col_types argument. This package will attempt to set each vector's data type on import, and if there are any inconsistencies it will produce warnings such as

In read_fun(path = path, sheet_i = sheet, limits = limits,  ... :
Expecting numeric in F1107 / R1107C6: got '?'

These are not a serious issue, so don't be put off using the package. Thanks, Stéphane!


EDIT ~ 1 week later

After using readxl in another script with different data, I have switched back to using openxlsx as my basic go-to package. Although readxl worked well as a workaround for my original issue, the number of warnings() it throws up is really irritating. In this second case, it was becoming unusable; each time I ran a line of code (whether readxl was involved or not), it would trigger warnings about e.g. Unknown or uninitialised column. This was only resolved by closing down the R session and starting again. I'm sure it is only because I'm making a small mistake with readxl, but unless I'm faced with the same situation as above with mis-formatted dates, I will stick with openxlsx, which I have generally found straightforward to use.

Upvotes: 2

Related Questions