Reputation: 13
I am using R for web scraping. The information that I need is in the links of this webpage. But when I click, the link goes to the same page I was on. How can I scrape the info following these other links until I get the tables with the information I need? I started using R several months ago and I know httr, Curl and other packages, but I am not able to scrape this webpage. I need an output such as this (through clicking "Todo el territorio" and Tipo de estudios: "Bachillerato"):
Provincia|Localidad|Denominacion Generica|Denominacion Especifica|Codigo|Naturaleza
Almería|Adra|Instituto de Educación Secundaria|Abdera|04000110|Centro público
Almería|Adra|Instituto de Educación Secundaria|Gaviota|04000134|Centro público
...
This would be my general script using Selenium package but it does not work and I accept any option:
library(RSelenium)
library(XML)
library(magrittr)
RSelenium::checkForServer()
RSelenium::startServer()
remDrv <- RSelenium::remoteDriver(remoteServerAddr = "localhost", port = 4444, browserName = "chrome")
remDrv$open()
remDrv$navigate('https://www.educacion.gob.es/centros/selectaut.do')
remDrv$findElement(using = "xpath", "//select[@name = '.listado-inicio']/option[@value = ('02','00')]")$clickElement()
...
or something like this. I have found something similar to this script looking for other topics in stackoverflow but I do not get anything. I accept other solutions with other scripts. Thanks a lot.
Upvotes: 1
Views: 1230
Reputation: 21443
Using 'RSelenium' to navigate the site you could do:
library(RSelenium)
library(rvest)
#start RSelenium
checkForServer()
startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate('https://www.educacion.gob.es/centros/selectaut.do')
#Click on the todo el territorio link
remDr$findElement(using = "xpath", "//a[text()='Todo el territorio']")$clickElement()
#select the Bachillerato option (has a value of 133) and click on the search button
remDr$findElement(using = "xpath", "//select[@id='comboniv']/option[@value='133']")$clickElement()
remDr$findElement(using = "xpath", "//input[@id='idGhost']")$clickElement()
#Click on the show results button
remDr$findElement(using = "xpath", "//input[@title='Buscar']")$clickElement()
#parse the html and get the table
doc <- htmlParse(remDr$getPageSource()[[1]],encoding="UTF-8")
data <- readHTMLTable(doc)$matcentro
Upvotes: 1