Anna Jean
Anna Jean

Reputation: 11

XPath Wordpress Scraper plugin

I'm trying to scrape the city and the state separately using either XPath or Regex. I'm able to select both city and state, separated by comma, such as

Trail, BC (page link)

by Xpath:

//div[contains(text(), ",")])[1]
/div[1]/div[1]/div[3]/div/div/div[1]/div[1]/div[3]/div[2]/div/div/div/div[4]

or by Regex:

([A-z]+)(,\s)(AB|BC|ON)

However, when I try to scrape either City or Province by substring-before and after, such as: Xpath 2.0 //div[contains(text(), ",")])[1]/substring-after(text(),",") or Xpath 1.0 substring-after(//div[contains(text(), ",")])[1],",")

The plugin is unable to return the city only. Is it anything wrong in the syntax?

Upvotes: 0

Views: 143

Answers (1)

E.Wiest
E.Wiest

Reputation: 5915

Use relative XPath expressions.

//span[@data-indeed-apply-joblocation]/@data-indeed-apply-joblocation

Output : Trail, BC

substring-before(//span[@data-indeed-apply-joblocation]/@data-indeed-apply-joblocation,",")

Output : Trail

substring-after(//span[@data-indeed-apply-joblocation]/@data-indeed-apply-joblocation,", ")

Output : BC

EDIT : Since substring functions are not supported, use a regex to clean the result. Keep the XPath I've provided (set the "Part" in the right panel to "Text Content"). Then, in the "Tranform" menu ("Advance mode" in the right panel), under "Find & Replace", input the following regex :

^.+,\W

Replace with nothing.

Output :

OP

Upvotes: 0

Related Questions