Reputation: 21
I need to extract the following 3 addresses separately before the phone numbers from this hideous HTML but I am absolutely stumped
<div class='additional-locations collapsible'>
<div class='row'>
<div class='location'>
CompanyName<br /> 123 Some Street<br />City Province PostalCode<br />Country<br /><strong>Phone:</strong>123 456 7890<br /><strong>Fax:</strong> 123 456 7890
<br />
<strong>County:</strong> County<br />
<strong>Electoral District:</strong> 01<br />
<hr />
CompanyName<br /> 546 SomeOther Street<br />City Province PostalCode<br />Country<br /><strong>Phone:</strong>123 456 7890<br /><strong>Fax:</strong> 123 456 7890
<br />
<strong>County:</strong> County<br />
<strong>Electoral District:</strong> 02<br />
<hr />
CompanyName<br /> 378 Another Street<br />City Province PostalCode<br />Country<br /><strong>Phone:</strong>123 456 7890<br /><strong>Fax:</strong> 123 456 7890
<br />
<strong>County:</strong> County<br />
<strong>Electoral District:</strong> 03<br />
</div>
</div>
</div>
I thought I would query for
//div[contains(@class,'additional-practice-location')]//div[@class='practice-location']/text()[preceding::strong[contains(text(), 'Phone')][1]]
and try to grab the text before it but I can't seem to figure it out, can anyone help?
Upvotes: 2
Views: 57
Reputation: 52665
As you've added xpath-2.0
tag try below XPath expression that should be applicable for XPath 2.0 to get required data:
for $i in //div[@class='location']/text()[normalize-space()="CompanyName"]
return $i/string-join(following-sibling::text()[position()<4], ", ")
Output:
123 Some Street, City Province PostalCode, Country
546 SomeOther Street, City Province PostalCode, Country
378 Another Street, City Province PostalCode, Country
Upvotes: 1