Reputation: 539
I would like to extract the table (table 4) from the URL "http://www.moneycontrol.com/financials/oilnaturalgascorporation/profit-loss/IP02". The catch is that I will have to use RSelenium
Now here is the code I am using:
remDr$navigate(URL)
doc<-htmlParse(remDr$getPageSource()[[1]])
x<-readHTMLTable(doc)
The above code is not able to extract the table 4. However when I do not use Rselenium like below, I am able to extract the table easily
download.file(URL,'quote.html')
doc<-htmlParse('quote.html')
x<-readHTMLTable(doc,which=5)
Please let me the solution as I have been stuck on this part for a month now. Appreciate your suggestions
Upvotes: 1
Views: 867
Reputation: 1
I'm struggling with more or less the same issue: I'm trying to come up with a solution that doesn't use htmlParse: for example (after navigating to the page): table <- remDr$findElements(using = "tag name", value = "table"))
You might have to use css or xpath on yours, next step I'm still working on.
I finally got a table downloaded into a nice little data frame, It seems easy when you get it figured out. Using the help page from the XML package:
library(RSelenium)
library(XML)
u <- 'http://www.w3schools.com/html/html_tables.asp'
doc <- htmlParse(u)
tableNodes <- getNodeSet(do9c, "//table")
tb <- readHTMLTable(tableNodes[[1]])
Upvotes: 0
Reputation: 1910
I think it works fine. The table you were able to get using download.file can also be gotten by using the following code for RSelenium
readHTMLTable(htmlParse(remDr$getPageSource(),asText=TRUE),header=TRUE,which=6)
Hope that helps!
Upvotes: 1
Reputation: 539
I found the solution. In my case, I had to first navigate to the inner frame (boxBg1) before I could extract the outer html and then use readHtmlTable function. It works fine now. Will post in case I run into a similar issue in the future
Upvotes: 0