pb_ng
pb_ng

Reputation: 361

xPath for varying node numbers

I am trying to extract addresses using xPath from URLs like

https://www.americangemsociety.org/bradshaw-s-jewelers https://www.americangemsociety.org/fincher-ozment-jewelers

etc.

However the problem is that the position of the addresses isn't uniform throughout the pages. Some of the pages have the address on Paragraph node # 4 while some others have them on Paragraph Node # 2 and so on.

I was wondering if I could use an xPath that identifies the addresses based on the 'strong class' of Address instead of a specific Node #

Example of an address within the HTML

<p><strong class="">Address:</strong> 4355 Montgomery Hwy, Ste 2, Dothan, Alabama 36303-1696</p>

Kindly advise

Thanks

Upvotes: 0

Views: 24

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167516

If you use //p[strong[not(normalize-space(@class)) and . = 'Address:']] then you select all p elements which contain a strong element with contents Address:.

Upvotes: 1

Related Questions