Reputation: 173
Pleased to be member of StackOverflow, a long time lurker in here.
I need to parse text between two tags, so far I've found a wonderful tool called Xidel
I need to parse text in between
<div class="description">
Text. <tag>Also tags.</tag> More text.
</div>
However, said text can include HTML tags in it, and I want them to be printed out in raw format. So using a command like:
xidel --xquery '//div[@class="description"]' file.html
Gets me:
Text. Also tags. More text.
And I need it to be exactly as it is, so:
Text. <tag>Also tags.</tag> More text.
How can I achieve this?
Regards, R
Upvotes: 4
Views: 1825
Reputation: 3935
You can show the tags by adding the --output-format=xml
option.
xidel --xquery '//div[@class="description"]' --output-format=xml file.html
Upvotes: 1
Reputation: 321
Can be done in a couple of ways with Xidel, which is why I love it so much.
HTML-templating:
xidel -s file.html -e "<div class='description'>{inner-html()}</div>"
XPath:
xidel -s file.html -e "//div[@class='description']/inner-html()"
CSS:
xidel -s file.html -e "inner-html(css('div.description'))"
BTW, on Linux: swap the double quotes for single and vice versa.
Upvotes: 4