Reputation: 11
I'm trying to scrape the city and the state separately using either XPath or Regex. I'm able to select both city and state, separated by comma, such as
Trail, BC (page link)
by Xpath:
//div[contains(text(), ",")])[1]
/div[1]/div[1]/div[3]/div/div/div[1]/div[1]/div[3]/div[2]/div/div/div/div[4]
or by Regex:
([A-z]+)(,\s)(AB|BC|ON)
However, when I try to scrape either City or Province by substring-before and after, such as:
Xpath 2.0 //div[contains(text(), ",")])[1]/substring-after(text(),",")
or Xpath 1.0 substring-after(//div[contains(text(), ",")])[1],",")
The plugin is unable to return the city only. Is it anything wrong in the syntax?
Upvotes: 0
Views: 143
Reputation: 5915
Use relative XPath expressions.
//span[@data-indeed-apply-joblocation]/@data-indeed-apply-joblocation
Output : Trail, BC
substring-before(//span[@data-indeed-apply-joblocation]/@data-indeed-apply-joblocation,",")
Output : Trail
substring-after(//span[@data-indeed-apply-joblocation]/@data-indeed-apply-joblocation,", ")
Output : BC
EDIT : Since substring functions are not supported, use a regex to clean the result. Keep the XPath I've provided (set the "Part" in the right panel to "Text Content"). Then, in the "Tranform" menu ("Advance mode" in the right panel), under "Find & Replace", input the following regex :
^.+,\W
Replace with nothing.
Output :
Upvotes: 0