Reputation: 11
I am trying to scrape job titles from a given Url, but the values are empty. Any suggestion will be appreciated, I am a beginner and find myself a bit lost. This is the code I am running:
library(rvest)
library(xml2)
library(dplyr)
library(stringr)
library(httr)
links <-"https://es.indeed.com/jobs?q=ingeniero+energ%C3%ADas+renovables&start=10"
page = read_html(links)
titulo = page %>%
html_nodes(".jobtitle") %>%
html_text(trim=TRUE)
Upvotes: 0
Views: 506
Reputation: 5673
I advise you to learn a bit of css and xpath before trying to do scraping. Then, you need to use the element inspector of your web-browser to understand the html structure of the webpage.
Here in your page, the title is an h2
of class title
, containing a a
element which contains the title you want in the title
attribute. You can do, using xpath:
page = read_html(links)
page %>%
html_nodes(xpath = "//h2[@class = 'title']")%>%
html_nodes(xpath = "//a[starts-with(@class,'jobtitle')]")%>%
html_attr("title")
[1] "Estudiante Ingeniería Eléctrica o Energías Renovables VALLADOLID"
[2] "Ingeniero Eléctrico Diseño ePLAN - Energías Renovables"
[3] "INVESTIGADOR ENERGÍA: Energías renovables ámbitos eléctricos, térmicos y construcción sostenible"
[4] "PROGRAMADOR/A JUNIOR EN ZARAGOZA"
[5] "ingeniero/a electrico"
[6] "Ingeniero/a Ofertas O&M Energía"
[7] "Ingeniero de Desarrollo de Negocio"
[8] "SOPORTE ADMINISTRATIVO DE COMPRAS"
[9] "Ingeniero de Planificación"
[10] "Ingeniero Geotécnico"
[11] "Project Manager Energías Renovables (Pontevedra)"
[12] "Ingeniero/a Cálculo Estructural ANSYS CLASSIC"
[13] "Project Manager SCADA Energía Renovables"
[14] "Ingeniero de Servicio Tecnico Comercial"
[15] "FORMADOR/A CERTIFICADO DE PROFESIONALIDAD ENAE0111-OPERACIONES BÁSICAS EN EL MONTAJE Y MANTENIMIENTO DE INSTALACIONES DE ENERGÍAS RENOVABLES, HUELVA"
Here I use starts-with
in the second xpath because the class of the a
element is a bit complicated, is surely defined by the website itself, and could maybe change in the future. But we hope that it will always starts with jobtitle
Upvotes: 1