SebastianM
SebastianM

Reputation: 45

How to print values from different fields combined?

I'm using xidel and playing with web scraping (without templates for now). I would like to get the title and price of a book and its price so they could be printed on one line for each entry:

title --> price

Based on an answer from this forum I can write:

./xidel -e 'doc("https://books.toscrape.com")//*[self::p[@class="price_color"] or self::h3]'

But how to write title and its price in one line?

Thank you

Upvotes: 0

Views: 107

Answers (3)

Reino
Reino

Reputation: 3443

Need to remember: Check the HTML structure first!

If an HTML source is minified, or unreadably prettified (with lots of illogical indentations for instance), then for a better overview of all the element-nodes I'd recommend either of the following 2 commands:

$ xidel -s "https://books.toscrape.com" -e . --output-node-format=xml --output-node-indent
$ xidel -se 'serialize(doc("https://books.toscrape.com"),{"indent":true()})'

Then you'll quickly notice that the text-nodes you're after are direct children of the <article>-element-node and not descendants (.// not necessary). And since it's all (text-)nodes you're dealing with, you don't really need the ! (simple map operator):

$ xidel -s "https://books.toscrape.com" -e '
  //article/join((div/p[@class="price_color"],h3),";")
'

And personally, I only use x:join() / string-join() for combining 3 items or more. For 2 items I always do a simple string-concatenation:

$ xidel -s "https://books.toscrape.com" -e '
  //article/(div/p[@class="price_color"]||";"||h3)
'
$ xidel -s "https://books.toscrape.com" -e '
  //article/concat(div/p[@class="price_color"],";",h3)
'
$ xidel -s "https://books.toscrape.com" -e '
  //article/x"{div/p[@class="price_color"]};{h3}"
'

The last one is Xidel's own extended-string-syntax.

Upvotes: 1

Martin Honnen
Martin Honnen

Reputation: 167716

Try

./xidel -e 'doc("https://books.toscrape.com")//article[@class = "product_pod"]!(.//h3 || "-->" || .//p[@class="price_color"])'

Upvotes: 1

SebastianM
SebastianM

Reputation: 45

I followed Martin advice and check the html structure and indeed there was an Article element in the code that should be used. Martin solution works and the one I came to probably at the same time is:

./xidel -e 'doc("https://books.toscrape.com")//article ! string-join((.//p[@class="price_color"], .//h3), ";")'

Need to remember: Check the HTML structure first!

Issue solved

Upvotes: 0

Related Questions