Reputation: 31
I am trying to download an excel file online and read only lines that contains the word "ORD".
fileUrl <-("http://www.hkexnews.hk/reports/sharerepur/documents/SRRPT20151211.xls")
x <- getURLContent(fileUrl)
out <- read.table(fileUrl,x )
I am using GetUrlContent but receive an error at the early stage of the process:
Warning messages:
1: In read.table(fileUrl, x) : line 1 appears to contain embedded nulls 2: In read.table(fileUrl, x) : line 2 appears to contain embedded nulls 3: In read.table(fileUrl, x) : line 3 appears to contain embedded nulls 4: In read.table(fileUrl, x) : line 4 appears to contain embedded nulls 5: In read.table(fileUrl, x) : line 5 appears to contain embedded nulls 6: In if (!header) rlabp <- FALSE : the condition has length > 1 and only the first element will be used 7: In if (header) { : the condition has length > 1 and only the first element will be used 8: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : embedded nul(s) found in input
The table "out" comes out almost unreadable. Does anyone knows how to read exactly the specific line rather than importing the whole file at the risk of getting the error lines?
Upvotes: 0
Views: 105
Reputation: 31
I just found a solution, thank you Tim for putting me in the right direction:
library(gdata)
DownloadURL <- "http://www.hkexnews.hk/reports/sharerepur/documents/SRRPT20151211.xls"
out <- read.xls(DownloadURL, pattern="ORD", perl = "C:\\Perl64\\bin\\perl.exe")
Upvotes: 1
Reputation: 522626
One of the answers to this SO question recommends using the gdata library to download the Excel file from the web and then using read.xls()
to read it into a data frame. Something like this:
library(gdata)
download.file("http://www.hkexnews.hk/reports/sharerepur/documents/SRRPT20151211.xls", destfile="file.xls")
out <- read.xls("file.xls", header=TRUE, pattern="Some Pattern")
The pattern
flag tells read.xls()
to ignore everything until the first line in which Some Pattern
appears. You can change the value to something which allows you to skip the preliminary material before the actual data you want in your data frame.
Upvotes: 1