Michael Irwin
Michael Irwin

Reputation: 3149

Removing elements with Nokogiri

Using Nokogiri, how would I remove everything up to and including the body element of an HTML document? And also the closing body element and everything after?

Upvotes: 0

Views: 563

Answers (1)

mu is too short
mu is too short

Reputation: 434615

The easiest way (IMHO of course) would be to use XPath to extract the <body> element:

html = '<html><head><title>xxx</title></head><body><p>dsfkj</p><p><b>sdff</b> dsfsdf</p></body></html>'
doc  = Nokogiri::HTML(html)
body = doc.xpath('//body')

Now you have just the <body> element (and its children) in body. Then, to get the HTML:

body_html = body.to_s
# "<body>\n<p>dsfkj</p>\n<p><b>sdff</b> dsfsdf</p>\n</body>"

The trick is to extract what you want rather than trying to throw away what you don't want. The end result is the same but finding one thing that you want it easier than finding a bunch of things you don't want when you have a query language at your disposal.

Upvotes: 3

Related Questions