Reputation: 59
i have extract the table that say "R.U.T" and "Entidad" of the page
http://www.svs.cl/portal/principal/605/w3-propertyvalue-18554
I make the follow code:
library(rvest)
#put page
url<-paste("http://www.svs.cl/portal/principal/605/w3-propertyvalue-18554.html",sep="")
url<-read_html(url)
#extract table
table<-html_node(url,xpath='//*[@id="listado_fiscalizados"]/table') #xpath
table<-html_table(table)
#transform table to data.frame
table<-data.frame(table)
but R show me the follow result:
> a
{xml_nodeset (0)}
That is, it is not recognizing the table, Maybe it's because the table has hyperlinks?
If anyone knows how to extract the table, I would appreciate it. Many thanks in advance and sorry for my English.
Upvotes: 1
Views: 646
Reputation: 78792
It makes an XHR request to another resource which is used to make the table.
library(rvest)
library(dplyr)
pg <- read_html("http://www.svs.cl/institucional/mercados/consulta.php?mercado=S&Estado=VI&consulta=CSVID&_=1484105706447")
html_nodes(pg, "table") %>%
html_table() %>%
.[[1]] %>%
tbl_df() %>%
select(1:2)
## # A tibble: 36 × 2
## R.U.T. Entidad
## <chr> <chr>
## 1 99588060-1 ACE SEGUROS DE VIDA S.A.
## 2 76511423-3 ALEMANA SEGUROS S.A.
## 3 96917990-3 BANCHILE SEGUROS DE VIDA S.A.
## 4 96933770-3 BBVA SEGUROS DE VIDA S.A.
## 5 96573600-K BCI SEGUROS VIDA S.A.
## 6 96656410-5 BICE VIDA COMPAÑIA DE SEGUROS S.A.
## 7 96837630-6 BNP PARIBAS CARDIF SEGUROS DE VIDA S.A.
## 8 76418751-2 BTG PACTUAL CHILE S.A. COMPAÑIA DE SEGUROS DE VIDA
## 9 76477116-8 CF SEGUROS DE VIDA S.A.
## 10 99185000-7 CHILENA CONSOLIDADA SEGUROS DE VIDA S.A.
## # ... with 26 more rows
You can use Developer Tools in any modern browser to monitor the Network requests to find that URL.
Upvotes: 2
Reputation: 1563
This is the answer using RSelenium:
# Start Selenium Server
RSelenium::checkForServer(beta = TRUE)
selServ <- RSelenium::startServer(javaargs = c("-Dwebdriver.gecko.driver=\"C:/Users/Mislav/Documents/geckodriver.exe\""))
remDr <- remoteDriver(extraCapabilities = list(marionette = TRUE))
remDr$open() # silent = TRUE
Sys.sleep(2)
# Simulate browser session and fill out form
remDr$navigate("http://www.svs.cl/portal/principal/605/w3-propertyvalue-18554.html")
Sys.sleep(2)
doc <- htmlParse(remDr$getPageSource()[[1]], encoding = "UTF-8")
# close and stop server
remDr$close()
selServ$stop()
tables <- readHTMLTable(doc)
head(tables)
Upvotes: 1