loosecannon
loosecannon

Reputation: 7803

how does one remove <![CDATA[ ]]> tags from around text in XML using Hpricot?

i just want the text out of there with out those tags. Does Hrpicot.XML have any methods for this?

Upvotes: 5

Views: 1896

Answers (3)

Sarvesh
Sarvesh

Reputation: 1202

doc = Hpricot::XML(open('http://www.cnn.com/.element/ssi/www/auto/2.0/video/xml/most_popular.xml'))
(doc/:cnn_video/:video).each do |status|
  ['tease_txt'].each do |el|
    puts "#{status.at(el).inner_text}"
  end
end

Example output (looks spammy but this is not spam!):

New Reno air crash video shows impact
Teen catches 800-pound gator
Resuming careers post 'don't ask' repeal
Creepy skirt peepers
Bus-sized satellite to hit Earth thi ...
'DWTS' cast hits ballroom for first time
What caused trainer's death at SeaWorld?
What led to Troy Davis clemency denial?

Upvotes: 1

Daniel O'Hara
Daniel O'Hara

Reputation: 13438

doc.search("*") do |element|
    element.swap element.content if element.kind_of? Hpricot::CData
end

Upvotes: 2

loosecannon
loosecannon

Reputation: 7803

use element.inner_text instead of #inner_html and it removes them for you

Upvotes: 7

Related Questions