Yulia
Yulia

Reputation: 1585

Get HTML structure using Nokogiri

My task is to get the HTML structure of the document without data. From:

<html>
  <head>
    <title>Hello!</title>
  </head>
  <body id="uniq">
    <h1>Hello World!</h1>
  </body>
</html>

I want to get:

<html>
  <head>
    <title></title>
  </head>
  <body id="uniq">
    <h1></h1>
  </body>
</html>

There are a number of ways to extract data with Nokogiri, but I couldn't find a way perform the reverse task.

UPDATE: The solution found is the combination of two answers I received:

doc = Nokogiri::HTML(open("test.html"))
  doc.at_css("html").traverse do |node|
    if node.text?
      node.remove
    end
  end
    puts doc

The output is exactly the one I want.

Upvotes: 2

Views: 996

Answers (2)

pguardiario
pguardiario

Reputation: 54984

It sounds like you want to remove all the text nodes. You can do this like so:

doc.xpath('//text()').remove
puts doc

Upvotes: 4

Larry K
Larry K

Reputation: 49104

Traverse the document. For each node, delete what you don't want. Then write out the document.

Remember that Nokogiri can change the document. Doc

Upvotes: 1

Related Questions