RomanM
RomanM

Reputation: 173

Xidel extract data inside the tag -- raw output

Pleased to be member of StackOverflow, a long time lurker in here.

I need to parse text between two tags, so far I've found a wonderful tool called Xidel

I need to parse text in between

<div class="description">
Text. <tag>Also tags.</tag> More text.
</div>

However, said text can include HTML tags in it, and I want them to be printed out in raw format. So using a command like:

xidel --xquery '//div[@class="description"]' file.html

Gets me:

Text. Also tags. More text.

And I need it to be exactly as it is, so:

Text. <tag>Also tags.</tag> More text.

How can I achieve this?

Regards, R

Upvotes: 4

Views: 1825

Answers (2)

Cameron Hudson
Cameron Hudson

Reputation: 3935

You can show the tags by adding the --output-format=xml option.

xidel --xquery '//div[@class="description"]' --output-format=xml file.html 

Upvotes: 1

MatrixView
MatrixView

Reputation: 321

Can be done in a couple of ways with Xidel, which is why I love it so much.

HTML-templating:

xidel -s file.html -e "<div class='description'>{inner-html()}</div>"

XPath:

xidel -s file.html -e "//div[@class='description']/inner-html()"

CSS:

xidel -s file.html -e "inner-html(css('div.description'))"

BTW, on Linux: swap the double quotes for single and vice versa.

Upvotes: 4

Related Questions