Reputation: 21
I am out of ideas trying to scrape data from a job portal. Basic rvest plus xml2 package usage does not bring me to my goal to extract job title, company, location, release date as well as lower and upper salary bound.
The primitive beginnings of my coding below.
library(dplyr)
library(rvest)
library(xml2)
Data <- read_html("https://gehaltsreporter.de/stellenangebote-jobs/?q=Immobilienkaufmann")
Using the Selectorgadget tool, which e.g. identifies the job title (among others “Immobilienkaufmann”) as “.job-title”, does not work within the usual coding manner like
Data %>% html_nodes(“.job-title”) %>% html_text2()
Taking the other alternative of copying the Selectorgadget XPath CSS selector does not lead me to extracting the relevant data into R either. Anyone who can help out?
Best, Christoph
Upvotes: 0
Views: 231
Reputation: 1336
Below a possible solution with Rselenium
In this way you are able to scrape all 72 pages.
install.packages('RSelenium')
library(RSelenium)
rD <- RSelenium::rsDriver(browser = "firefox", check = FALSE)
remDr <- rD[["client"]]
remDr$navigate("https://gehaltsreporter.de/stellenangebote-jobs/?q=Immobilienkaufmann")
all_data<-list()
for(j in 1:72) {
for(i in 1:14) {
x<-remDr$findElement(using = 'xpath', value = paste0('//*[@id="results"]/ul/li[',14,']'))
x <- x$getElementText()
x <- gsub('\n', ' ', x)
all_data[i]<-x
}
remDr$navigate(paste0("https://gehaltsreporter.de/stellenangebote-jobs/?q=Immobilienkaufmann&pg=",j))
Sys.sleep(2)
}
print(all_data)
[[1]]
[1] "Kundenberater:in für Kabel-TV und Internet-Lösungen Vodafone Deutschland GmbH Peine (DE) 03.10.2021 36.184 € - 49.284 € schätzt Gehaltsreporter.de"
[[2]]
[1] "Kundenberater:in für Kabel-TV und Internet-Lösungen Vodafone Deutschland GmbH Peine (DE) 03.10.2021 36.184 € - 49.284 € schätzt Gehaltsreporter.de"
[[3]]
[1] "Kundenberater:in für Kabel-TV und Internet-Lösungen Vodafone Deutschland GmbH Peine (DE) 03.10.2021 36.184 € - 49.284 € schätzt Gehaltsreporter.de"
[[4]]
[1] "Kundenberater:in für Kabel-TV und Internet-Lösungen Vodafone Deutschland GmbH Peine (DE) 03.10.2021 36.184 € - 49.284 € schätzt Gehaltsreporter.de"
....
Upvotes: 1