Reputation: 1585
My task is to get the HTML structure of the document without data. From:
<html>
<head>
<title>Hello!</title>
</head>
<body id="uniq">
<h1>Hello World!</h1>
</body>
</html>
I want to get:
<html>
<head>
<title></title>
</head>
<body id="uniq">
<h1></h1>
</body>
</html>
There are a number of ways to extract data with Nokogiri, but I couldn't find a way perform the reverse task.
UPDATE: The solution found is the combination of two answers I received:
doc = Nokogiri::HTML(open("test.html"))
doc.at_css("html").traverse do |node|
if node.text?
node.remove
end
end
puts doc
The output is exactly the one I want.
Upvotes: 2
Views: 996
Reputation: 54984
It sounds like you want to remove all the text nodes. You can do this like so:
doc.xpath('//text()').remove
puts doc
Upvotes: 4