Isabel
Isabel

Reputation: 95

Issue downloading and opening xlsx-file from within R

I would like to download and open the following Excel-file with monthly and annual consumer price indices directly from within R.

https://www.bfs.admin.ch/bfsstatic/dam/assets/7066959/master (the link can be found on this site: https://www.bfs.admin.ch/bfs/de/home/statistiken/preise/landesindex-konsumentenpreise/lik-resultate.assetdetail.7066959.html)

I used to download this file manually using the browser, save it locally on my computer, then open the xlsx-file with R and work with the data without any problems.

I have now tried to read the file directly from within R, but without luck so far. As you can see from the URL above, there is no .xlsx extension or the like, so I figured the file is zipped somehow. Here is what I've tried so far and where I am stuck.

library(foreign)
library(xlsx)


# in a browser, this links opens or dowloads an xlsx file
likurl <- "https://www.bfs.admin.ch/bfsstatic/dam/assets/7066959/master"

temp <- tempfile()
download.file(likurl, temp)

list.files <- unzip(temp,list=TRUE)

data <- read.xlsx(unz(temp,
                  + list.files$Name[8]), sheetIndex=2)

The result from the last step is

 Error in +list.files$Name[8] : invalid argument to unary operator

I do not really understand the unz function, but can see this is somehow wrong when reading the help file for unz (I found this suggested solution somewhere online).

I also tried the following, different approach:

library(XLConnect)
likurl <- "https://www.bfs.admin.ch/bfsstatic/dam/assets/7066959/master"

tmp = tempfile(fileext = ".xlsx")
download.file(likurl, tmp)
readWorksheetFromFile(tmp, sheet = 2, startRow = 4, 
                  colNames = TRUE, rowNames = FALSE)

with the last line returning as result:

 Error: ZipException (Java): invalid entry size (expected 1644 but got 1668 bytes)

I would greatly appreciate any help on how I can open this data and work with it as usual when reading in data from excel into R. Thanks a lot in advance!

Upvotes: 0

Views: 332

Answers (1)

Isabel
Isabel

Reputation: 95

Here's my solution thanks to the hint by @Johnny. Reading the data from excel worked better with read.xlsx from the xlsx-package (instead of read_excel as suggested in the link above).

Some ugly details still remain with how the columns are named (colNames are not passed on correctly, except for the first and 11th column) and how strangely new columns are created from the options passed to read.xlsx (e.g., a column named colNames, with all entries == TRUE; for details, see the output structure with str(LIK.m)). However, these would be for another question and for the moment, they can be fixed in the quick and dirty way :-).

library(httr)
library(foreign)
library(xlsx)

# in a browser, this links opens or dowloads an xlsx file
likurl<-'https://www.bfs.admin.ch/bfsstatic/dam/assets/7066959/master'
p1f <- tempfile()
download.file(likurl, p1f, mode="wb")

GET(likurl, write_disk(tf <- tempfile(fileext = ".xlsx")))


# annual CPI
LIK.y <- read.xlsx(tf,
             sheetIndex = 2, startRow = 4,
             colNames = TRUE, rowNames = FALSE, stringsAsFactors = FALSE,
             detectDates = FALSE, skipEmptyRows = TRUE, skipEmptyCols = TRUE ,  
             na.strings = "NA", check.names = TRUE,  fillMergedCells = FALSE)

LIK.y$X. <- as.numeric(LIK.y$X.)

str(LIK.y)


# monthly CPI
LIK.m <- read.xlsx(tf,
             sheetIndex = 1, startRow = 4,
             colNames = TRUE, rowNames = FALSE, stringsAsFactors = FALSE,
             detectDates = FALSE, skipEmptyRows = TRUE, skipEmptyCols = TRUE ,  
             na.strings = "NA", check.names = TRUE,  fillMergedCells = FALSE)

LIK.m$X. <- as.numeric(LIK.m$X.)

str(LIK.m)

Upvotes: 1

Related Questions