Christoph
Christoph

Reputation: 21

R: Web scraping from job portal

I am out of ideas trying to scrape data from a job portal. Basic rvest plus xml2 package usage does not bring me to my goal to extract job title, company, location, release date as well as lower and upper salary bound.

The primitive beginnings of my coding below.

library(dplyr)
library(rvest)
library(xml2)
Data <- read_html("https://gehaltsreporter.de/stellenangebote-jobs/?q=Immobilienkaufmann")

Using the Selectorgadget tool, which e.g. identifies the job title (among others “Immobilienkaufmann”) as “.job-title”, does not work within the usual coding manner like

Data %>% html_nodes(“.job-title”) %>% html_text2()

Taking the other alternative of copying the Selectorgadget XPath CSS selector does not lead me to extracting the relevant data into R either. Anyone who can help out?

Best, Christoph

Upvotes: 0

Views: 231

Answers (1)

Earl Mascetti
Earl Mascetti

Reputation: 1336

Below a possible solution with Rselenium

In this way you are able to scrape all 72 pages.

install.packages('RSelenium') 
library(RSelenium)
rD <- RSelenium::rsDriver(browser = "firefox", check = FALSE)
remDr <- rD[["client"]]
remDr$navigate("https://gehaltsreporter.de/stellenangebote-jobs/?q=Immobilienkaufmann")
    all_data<-list()
    for(j in 1:72) {
      for(i in 1:14) {
        x<-remDr$findElement(using = 'xpath', value = paste0('//*[@id="results"]/ul/li[',14,']'))
        x <- x$getElementText() 
        x <- gsub('\n', ' ', x)
        all_data[i]<-x
      }
      remDr$navigate(paste0("https://gehaltsreporter.de/stellenangebote-jobs/?q=Immobilienkaufmann&pg=",j))
      Sys.sleep(2)
      }
print(all_data)

[[1]]
[1] "Kundenberater:in für Kabel-TV und Internet-Lösungen Vodafone Deutschland GmbH Peine (DE) 03.10.2021 36.184 € - 49.284 € schätzt Gehaltsreporter.de"

[[2]]
[1] "Kundenberater:in für Kabel-TV und Internet-Lösungen Vodafone Deutschland GmbH Peine (DE) 03.10.2021 36.184 € - 49.284 € schätzt Gehaltsreporter.de"

[[3]]
[1] "Kundenberater:in für Kabel-TV und Internet-Lösungen Vodafone Deutschland GmbH Peine (DE) 03.10.2021 36.184 € - 49.284 € schätzt Gehaltsreporter.de"

[[4]]
[1] "Kundenberater:in für Kabel-TV und Internet-Lösungen Vodafone Deutschland GmbH Peine (DE) 03.10.2021 36.184 € - 49.284 € schätzt Gehaltsreporter.de"

....

Upvotes: 1

Related Questions