Reputation: 59
I'm new to web scraping and am trying to scrape the following table:
<table class="dp-firmantes table table-condensed table->striped">
<thead>
<tr>
<th>FIRMANTE</th>
<th>DISTRITO</th>
<th>BLOQUE</th>
</tr>
</thead>
<tbody>
<tr>
<td>ROMERO, JUAN CARLOS</td>
<td>SALTA</td>
<td>JUSTICIALISTA 8 DE OCTUBRE</td>
</tr>
<tr>
<td>FIORE VIÑUALES, MARIA CRISTINA DEL >VALLE</td>
<td>SALTA</td>
<td>PARES</td>
</tr>
</tbody>
</table>
I'm using the rvest package and my code is the following:
link <- read_html("https://www.hcdn.gob.ar/proyectos/resultados-buscador.html?")
table <- html_nodes(link, 'table.dp-firmantes table table-condensed table-striped')
But when I go to look at the 'table' object in R, I get the following error: {xml_nodeset (0)}
My instinct is that I'm actually not scraping any of the html content from the table, but I don't know how to fix this/why this is occurring. I'm not sure if the error is in my R code, if I'm just using the wrong CSS selector or if maybe this is javascript code and not html? Please let me know what I'm doing wrong here.
Edited: here is the link I'm using https://www.hcdn.gob.ar/proyectos/resultados-buscador.html
Edited: screenshot of the search results table
Upvotes: 0
Views: 812
Reputation: 5908
You could try the following code to parse the "Listado de Autores" tables for those bills that have them. For instance bill with expendiente N. 820/18 (link = http://www.senado.gov.ar/parlamentario/comisiones/verExp/820.18/S/PL) has that table, but I webscraped the first 500 bills and did not find any other bill with such data.
library(tidyverse)
library(rvest)
html_object <- read_html('http://www.senado.gov.ar/parlamentario/comisiones/verExp/820.18/S/PL')
html_object %>%
html_node(xpath = "//div[@id = 'Autores']/table") %>% # This is the xpath adress that worked for me. The CSS locator ypu provide did not work.
html_table() %>% as_data_frame() %>% ## Get the html table and store it in a tibble
mutate(X1 = gsub("\\n|\\t| ", "", X1)) ##Remove the extra line brakes (\\n), tabs (\\t), and spaces (" ") present in the html table.
Results:
# A tibble: 2 x 2
X1
<chr>
1 Romero, Juan Carlos
2 Fiore Viñuales, María Cristina Del Valle
Edited: Screenshot of Rś html capture thrugh read_html('https://www.hcdn.gob.ar/proyectos/resultados-buscador.html?pagina=2')
Upvotes: 1