Reputation: 27
I want to scrape some webpages. I am using scrapy for this. Everything works fine, but I want to 'find' a field containing numbers, which sometimes is the second, the third or the fourth 'li' in the list. Perhaps I can show you the code from the webpage:
<ul class="basic-product-information key-value-list">
<li>
<span class="key">Sprache:</span>
<strong class="value">Unbekannt</strong>
</li>
<li>
<span class="key">Plattform:</span>
<span class="value">Bücher</span>
</li>
<li>
<span class="key">EAN / ISBN:</span>
<span class="value">9783442158126</span>
</li>
</ul>
The value I want to get as result is 9783442158126.
At the moment I am locating the table with this:
//*[@id="book-info"]/ul/li[x]/span[2]
I am parsing all the 'li' (1, 2, 3, 4, 5) and then I get a CSV which I have to edit by hand, because I just need the ISBN - not the other things.
Is there a way to automat this? Perhaps I can tell XPATH to search for 13 digit numbers?
Thank you very much.
Best regards, Julian
Upvotes: 0
Views: 75
Reputation: 36282
You could use and implicit and
, concatenating expression between square brackets, and check:
1.- Its length with string-length()
function.
2.- It's a number converting with number()
function and comparing. It wont match for booleans because false
is 0
whereas true
is 1
, and neither for strings because they will be NaN
, that is different from NaN
, so try with:
//ul/li/span[2][number(text()) = number(text())][string-length() = 13]
UPDATE: To achieve the new requirement asked in comments, the easiest path is to use the or condition translated as |
in xpath
. To match the last X
use substring-before()
to get the number an increment the string-length
by one:
//ul/li/span[2][number(text()) = number(text())][string-length() = 13] |
//ul/li/span[2][number(substring-before(text(), "X")) = number(substring-before(text(), "X"))][string-length() = 14]
Upvotes: 1