Reputation: 2500
I'm trying to load a .xlsx
file in R, using the openxlsx
package. Unfortunately, the spreadsheet has some strange formatting in the date column, which is in the format "Month/Day/Year", e.g. 9/21/2014
. (Excel recognises this as a date format). When importing with read.xlsx
, the month and day are missed off, leaving only the year as a numeric column. I suspect it is something to do with the /
character.
df <- read.xlsx("The File.xlsx", sheet = "Sheet 1")
head(df)
Number Type Other.Type Date
1 902 611 2014
2 902 611 2014
3 902 611 2014
4 795 966 2014
...
I've tried including the detectDates = TRUE
argument, but that just gives NAs
.
I can't edit the spreadsheet as the data belongs to someone else and I have just been given access to it. Is there an equivalent of the colClasses
argument from the xlsx
package, or any other way of getting the data into R?
Many thanks
Upvotes: 3
Views: 10616
Reputation: 977
This worked for me after changing the format in excel to: date "2012-03-14"
library(openxlsx)
read.xlsx(xlsxFile = "The File.xlsx", sheet = "sheet 1" , detectDates = TRUE)
Upvotes: 4
Reputation: 2500
This answer is just for completeness, in case anyone else ends up here with a similar problem. All thanks due to @StéphaneLaurent, who provided the suggestion in the comments.
Switching to the readxl
package resolved the problems. Note be sure to check the the help file for the read_xlsx
call; particularly for the col_types
argument. This package will attempt to set each vector's data type on import, and if there are any inconsistencies it will produce warnings such as
In read_fun(path = path, sheet_i = sheet, limits = limits, ... :
Expecting numeric in F1107 / R1107C6: got '?'
These are not a serious issue, so don't be put off using the package. Thanks, Stéphane!
EDIT ~ 1 week later
After using readxl
in another script with different data, I have switched back to using openxlsx
as my basic go-to package. Although readxl
worked well as a workaround for my original issue, the number of warnings()
it throws up is really irritating. In this second case, it was becoming unusable; each time I ran a line of code (whether readxl
was involved or not), it would trigger warnings about e.g. Unknown or uninitialised column
. This was only resolved by closing down the R session and starting again. I'm sure it is only because I'm making a small mistake with readxl
, but unless I'm faced with the same situation as above with mis-formatted dates, I will stick with openxlsx
, which I have generally found straightforward to use.
Upvotes: 2