user119144
user119144

Reputation: 59

Web Scraping, extract table of a page

i have extract the table that say "R.U.T" and "Entidad" of the page

http://www.svs.cl/portal/principal/605/w3-propertyvalue-18554

I make the follow code:

library(rvest)
    #put page
    url<-paste("http://www.svs.cl/portal/principal/605/w3-propertyvalue-18554.html",sep="")
     url<-read_html(url)
    #extract table

table<-html_node(url,xpath='//*[@id="listado_fiscalizados"]/table') #xpath
table<-html_table(table)

#transform table to data.frame
table<-data.frame(table)

but R show me the follow result:

> a
{xml_nodeset (0)}

That is, it is not recognizing the table, Maybe it's because the table has hyperlinks?

If anyone knows how to extract the table, I would appreciate it. Many thanks in advance and sorry for my English.

Upvotes: 1

Views: 646

Answers (2)

hrbrmstr
hrbrmstr

Reputation: 78792

It makes an XHR request to another resource which is used to make the table.

library(rvest)
library(dplyr)

pg <- read_html("http://www.svs.cl/institucional/mercados/consulta.php?mercado=S&Estado=VI&consulta=CSVID&_=1484105706447")

html_nodes(pg, "table") %>%
  html_table() %>%
  .[[1]] %>%
  tbl_df() %>%
  select(1:2)
## # A tibble: 36 × 2
##        R.U.T.                                            Entidad
##         <chr>                                              <chr>
## 1  99588060-1                           ACE SEGUROS DE VIDA S.A.
## 2  76511423-3                               ALEMANA SEGUROS S.A.
## 3  96917990-3                      BANCHILE SEGUROS DE VIDA S.A.
## 4  96933770-3                          BBVA SEGUROS DE VIDA S.A.
## 5  96573600-K                              BCI SEGUROS VIDA S.A.
## 6  96656410-5                 BICE VIDA COMPAÑIA DE SEGUROS S.A.
## 7  96837630-6            BNP PARIBAS CARDIF SEGUROS DE VIDA S.A.
## 8  76418751-2 BTG PACTUAL CHILE S.A. COMPAÑIA DE SEGUROS DE VIDA
## 9  76477116-8                            CF SEGUROS DE VIDA S.A.
## 10 99185000-7           CHILENA CONSOLIDADA SEGUROS DE VIDA S.A.
## # ... with 26 more rows

You can use Developer Tools in any modern browser to monitor the Network requests to find that URL.

Upvotes: 2

Mislav
Mislav

Reputation: 1563

This is the answer using RSelenium:

# Start Selenium Server
RSelenium::checkForServer(beta = TRUE)
selServ <- RSelenium::startServer(javaargs = c("-Dwebdriver.gecko.driver=\"C:/Users/Mislav/Documents/geckodriver.exe\""))
remDr <- remoteDriver(extraCapabilities = list(marionette = TRUE))
remDr$open() # silent = TRUE
Sys.sleep(2)

# Simulate browser session and fill out form
remDr$navigate("http://www.svs.cl/portal/principal/605/w3-propertyvalue-18554.html")
Sys.sleep(2)
doc <- htmlParse(remDr$getPageSource()[[1]], encoding = "UTF-8")

# close and stop server
remDr$close()
selServ$stop()

tables <- readHTMLTable(doc)
head(tables)

Upvotes: 1

Related Questions