Scrape website links in R

Question

Using either rvest or RSelenium when you scrape the links in R, you are able to do it by defining the begining part of HTML code, e.g. a href within given node. What if I face 2 following links:

As you may see they differ just with very last part. Do you know how can I grab(define) only links with promo /no promo?

Paweł Kozielski-Romaneczko · Accepted Answer

Use xpath and XML library: Assuming that you are looking for the no_promo links:

library(XML)
library(httr)
response <- GET(yourLink)
parsedoc <- htmlParse(response)
xpathSApply(parsedoc, "//a[@data-featured-tracking='listing_no_promo']", xmlGetAttr, "href")

or if you just looking for those links which contains keyword "no_promo" in data-featured-tracking parameter, then the last part would be like:

xpathSApply(parsedoc, "//a[contains(@data-featured-tracking, 'no_promo')]", xmlGetAttr, "href")

Scrape website links in R

Answers (2)

Related Questions