Reputation: 45
I'm using xidel and playing with web scraping (without templates for now). I would like to get the title and price of a book and its price so they could be printed on one line for each entry:
title --> price
Based on an answer from this forum I can write:
./xidel -e 'doc("https://books.toscrape.com")//*[self::p[@class="price_color"] or self::h3]'
But how to write title and its price in one line?
Thank you
Upvotes: 0
Views: 107
Reputation: 3443
Need to remember: Check the HTML structure first!
If an HTML source is minified, or unreadably prettified (with lots of illogical indentations for instance), then for a better overview of all the element-nodes I'd recommend either of the following 2 commands:
$ xidel -s "https://books.toscrape.com" -e . --output-node-format=xml --output-node-indent
$ xidel -se 'serialize(doc("https://books.toscrape.com"),{"indent":true()})'
Then you'll quickly notice that the text-nodes you're after are direct children of the <article>
-element-node and not descendants (.//
not necessary). And since it's all (text-)nodes you're dealing with, you don't really need the !
(simple map operator):
$ xidel -s "https://books.toscrape.com" -e '
//article/join((div/p[@class="price_color"],h3),";")
'
And personally, I only use x:join()
/ string-join()
for combining 3 items or more. For 2 items I always do a simple string-concatenation:
$ xidel -s "https://books.toscrape.com" -e '
//article/(div/p[@class="price_color"]||";"||h3)
'
$ xidel -s "https://books.toscrape.com" -e '
//article/concat(div/p[@class="price_color"],";",h3)
'
$ xidel -s "https://books.toscrape.com" -e '
//article/x"{div/p[@class="price_color"]};{h3}"
'
The last one is Xidel's own extended-string-syntax.
Upvotes: 1
Reputation: 167716
Try
./xidel -e 'doc("https://books.toscrape.com")//article[@class = "product_pod"]!(.//h3 || "-->" || .//p[@class="price_color"])'
Upvotes: 1
Reputation: 45
I followed Martin advice and check the html structure and indeed there was an Article element in the code that should be used. Martin solution works and the one I came to probably at the same time is:
./xidel -e 'doc("https://books.toscrape.com")//article ! string-join((.//p[@class="price_color"], .//h3), ";")'
Need to remember: Check the HTML structure first!
Issue solved
Upvotes: 0