Hugo Acosta Martinez
Hugo Acosta Martinez

Reputation: 11

Scraping job titles from Indeed

I am trying to scrape job titles from a given Url, but the values are empty. Any suggestion will be appreciated, I am a beginner and find myself a bit lost. This is the code I am running:

library(rvest)
library(xml2)
library(dplyr)
library(stringr)
library(httr)

links <-"https://es.indeed.com/jobs?q=ingeniero+energ%C3%ADas+renovables&start=10"
page = read_html(links)
titulo = page %>% 
  html_nodes(".jobtitle") %>% 
  html_text(trim=TRUE)

Upvotes: 0

Views: 506

Answers (1)

denis
denis

Reputation: 5673

I advise you to learn a bit of css and xpath before trying to do scraping. Then, you need to use the element inspector of your web-browser to understand the html structure of the webpage.

Here in your page, the title is an h2 of class title, containing a a element which contains the title you want in the title attribute. You can do, using xpath:

    page = read_html(links)
    page %>%
      html_nodes(xpath = "//h2[@class = 'title']")%>%
      html_nodes(xpath = "//a[starts-with(@class,'jobtitle')]")%>%
      html_attr("title")


 [1] "Estudiante Ingeniería Eléctrica o Energías Renovables VALLADOLID"                                                                                    
 [2] "Ingeniero Eléctrico Diseño ePLAN - Energías Renovables"                                                                                              
 [3] "INVESTIGADOR ENERGÍA: Energías renovables ámbitos eléctricos, térmicos y construcción sostenible"                                                    
 [4] "PROGRAMADOR/A JUNIOR EN ZARAGOZA"                                                                                                                    
 [5] "ingeniero/a electrico"                                                                                                                               
 [6] "Ingeniero/a Ofertas O&M Energía"                                                                                                                     
 [7] "Ingeniero de Desarrollo de Negocio"                                                                                                                  
 [8] "SOPORTE ADMINISTRATIVO DE COMPRAS"                                                                                                                   
 [9] "Ingeniero de Planificación"                                                                                                                          
[10] "Ingeniero Geotécnico"                                                                                                                                
[11] "Project Manager Energías Renovables (Pontevedra)"                                                                                                    
[12] "Ingeniero/a Cálculo Estructural ANSYS CLASSIC"                                                                                                       
[13] "Project Manager SCADA Energía Renovables"                                                                                                            
[14] "Ingeniero de Servicio Tecnico Comercial"                                                                                                             
[15] "FORMADOR/A CERTIFICADO DE PROFESIONALIDAD ENAE0111-OPERACIONES BÁSICAS EN EL MONTAJE Y MANTENIMIENTO DE INSTALACIONES DE ENERGÍAS RENOVABLES, HUELVA"

Here I use starts-with in the second xpath because the class of the a element is a bit complicated, is surely defined by the website itself, and could maybe change in the future. But we hope that it will always starts with jobtitle

Upvotes: 1

Related Questions