Pino
Pino

Reputation: 31

Importing data from an Excel file online

I am trying to download an excel file online and read only lines that contains the word "ORD".

fileUrl <-("http://www.hkexnews.hk/reports/sharerepur/documents/SRRPT20151211.xls")
x <- getURLContent(fileUrl)
out <- read.table(fileUrl,x )

I am using GetUrlContent but receive an error at the early stage of the process:

Warning messages:

1: In read.table(fileUrl, x) : line 1 appears to contain embedded nulls  
2: In read.table(fileUrl, x) : line 2 appears to contain embedded nulls  
3: In read.table(fileUrl, x) : line 3 appears to contain embedded nulls  
4: In read.table(fileUrl, x) : line 4 appears to contain embedded nulls  
5: In read.table(fileUrl, x) : line 5 appears to contain embedded nulls  
6: In if (!header) rlabp <- FALSE :  
   the condition has length > 1 and only the first element will be used
7: In if (header) { :  
   the condition has length > 1 and only the first element will be used  
8: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : embedded nul(s) found in input

The table "out" comes out almost unreadable. Does anyone knows how to read exactly the specific line rather than importing the whole file at the risk of getting the error lines?

Upvotes: 0

Views: 105

Answers (2)

Pino
Pino

Reputation: 31

I just found a solution, thank you Tim for putting me in the right direction: library(gdata) DownloadURL <- "http://www.hkexnews.hk/reports/sharerepur/documents/SRRPT20151211.xls" out <- read.xls(DownloadURL, pattern="ORD", perl = "C:\\Perl64\\bin\\perl.exe")

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522626

One of the answers to this SO question recommends using the gdata library to download the Excel file from the web and then using read.xls() to read it into a data frame. Something like this:

library(gdata)
download.file("http://www.hkexnews.hk/reports/sharerepur/documents/SRRPT20151211.xls", destfile="file.xls")
out <- read.xls("file.xls", header=TRUE, pattern="Some Pattern")

The pattern flag tells read.xls() to ignore everything until the first line in which Some Pattern appears. You can change the value to something which allows you to skip the preliminary material before the actual data you want in your data frame.

Upvotes: 1

Related Questions