idleberg
idleberg

Reputation: 12882

How to replace outer tags using Nokogiri

Using Nokogiri, I'm trying to replace the outer tags of a HTML node where the most reliable way to detect it is through one of its children.

Before:

<div>
    <div class="smallfont" >Quote:</div>
    Words of wisdom
</div>

After:

<blockquote>
    Words of wisdom
</blockquote>

The following code snippet detects the element I'm after, but I'm not sure how to go on from there:

doc = Nokogiri::HTML(html)  

if doc.at('div.smallfont:contains("Quote:")') != nil
    q = doc.parent
    # replace tags of q
    # remove first_sibling
end

Upvotes: 0

Views: 367

Answers (2)

the Tin Man
the Tin Man

Reputation: 160551

I'd do it like this:

require 'nokogiri'

doc = Nokogiri::HTML(DATA.read)

smallfont_div = doc.at('.smallfont')
smallfont_div.parent.name = 'blockquote'
smallfont_div.remove

puts doc.to_html 

__END__
<div>
    <div class="smallfont" >Quote:</div>
    Words of wisdom
</div>

Which results in:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<blockquote>

    Words of wisdom
</blockquote>

</body></html>

The whitespace inside <blockquote> will be gobbled up by the browser when it's displayed, so it's usually not an issue, but some browsers will still show a leading space and/or trailing space.

If you want to cleanup the text node containing "Words of wisdom" then I'd do this instead:

smallfont_div = doc.at('.smallfont')
smallfont_parent = smallfont_div.parent
smallfont_div.remove
smallfont_parent.name = 'blockquote'
smallfont_parent.content = smallfont_parent.text.strip

Which results in:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<blockquote>Words of wisdom</blockquote>
</body></html>

Alternately, this will generate the same result:

smallfont_div = doc.at('.smallfont')
smallfont_parent = smallfont_div.parent
smallfont_parent_content = smallfont_div.next_sibling.text
smallfont_parent.name = 'blockquote'
smallfont_parent.content = smallfont_parent_content.strip

What the code is doing should be pretty easy to figure out as Nokogiri's methods are pretty self-explanatory.

Upvotes: 0

Adam Zapaśnik
Adam Zapaśnik

Reputation: 683

Does it work ok?

doc = Nokogiri::HTML(html)
if quote = doc.at('div.smallfont:contains("Quote:")')
  text = quote.next # gets the '    Words of wisdom'
  quote.remove # removes div.smallfont
  puts text.parent.replace("<blockquote>#{text}</blockquote>") # replaces wrapping div with blockquote block
end

Upvotes: 1

Related Questions