motipai
motipai

Reputation: 328

web scraping table to get hrefs inside it using rvest returns empty table

I am trying to get all hrefs inside a table using rvest. What I have done so far:

library(rvest)
library(stringr)
library(tidyverse)


url <- "https://br.advfn.com/bolsa-de-valores/bovespa/suzano-on-SUZB3/opcoes"

html1 <- read_html(url)

tbls2 <- html1 %>%
               html_nodes("#options-table") %>%
               html_table(fill = TRUE) %>%
               .[[1]]

it returns tbls2 as:

[1] Ativo              Tipo               Preço de Exercício Variação (%)      
[5] Volume             Vencimento         Modelo            
<0 rows> (or 0-length row.names)

The hrefs should be in each element of column Ativo. Why is this returning an empty table?

Upvotes: 1

Views: 89

Answers (1)

Chris
Chris

Reputation: 3986

If you view the page source and navigate to the table with id 'options-table' you'll see that the table body is empty. That's because the table is being populated using javascript from an external data source.

If we go to the chrome developer tools -> Network and filter by 'XHR' we can see the api request the page is making. In this case it's quite clear which one we want:

enter image description here

Having discovered where the data is coming from, we can just read it in directly with jsonlite:

library(jsonlite)
url <- 'https://br.advfn.com/common/bov-options/api?symbol=SUZB3&_=1576171286512'
jsn <- fromJSON(url)
df1 <- jsn$result 

dplyr::glimpse(df1)

# Observations: 100
# Variables: 10
# $ symbol            <chr> "SUZBL400", "SUZBX365", "SUZBX283", "SUZBX288", "SUZBX300", "SUZBL386", "SUZBX298",…
# $ type              <chr> "Call", "Put", "Put", "Put", "Put", "Call", "Put", "Put", "Put", "Put", "Put", "Put…
# $ style             <chr> "A", "E", "E", "E", "E", "A", "E", "E", "E", "E", "E", "E", "E", "E", "E", "E", "A"…
# $ strike_price      <chr> "40,06", "36,56", "28,31", "28,81", "30,06", "38,56", "29,81", "30,56", "31,06", "3…
# $ expiry_date       <chr> "16/12/2019", "16/12/2019", "16/12/2019", "16/12/2019", "16/12/2019", "16/12/2019",…
# $ volume            <chr> "5000", "2000", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "0", ""…
# $ volume_form       <chr> "5.000", "2.000", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", …
# $ change_percentage <chr> "76,47%", "-16,67%", "0,0%", "0,0%", "0,0%", "0,0%", "0,0%", "0,0%", "0,0%", "0,0%"…
# $ url               <chr> "/p.php?pid=quote&symbol=BOV%5ESUZBL400", "/p.php?pid=quote&symbol=BOV%5ESUZBX365",…
# $ class             <chr> "up", "dn", "nc", "nc", "nc", "nc", "nc", "nc", "nc", "nc", "nc", "nc", "nc", "nc",…


Upvotes: 3

Related Questions