Reputation: 113
I am trying to transform a tokenized string (an english sentence) to HTML span tags to display in HTML.
Here are the basic steps I am trying to perform
<root></root>
to make it a valid xml<span class=token>
#(Text " ")
which is present in the nokogiri object (step 7 in pry)Any pointers to the right method to use in nokogiri would be highly appreciative. Similarly, any other suggestion welcome.
You can view the code:
require 'nokogiri'
sentence_tagged = '<det>A</det> <nn>fleet</nn> <in>of</in> <nns>warships</nns><stop>.</stop>'
sentence_xml = '<root>' + sentence_tagged + '</root>'
nok_sent = Nokogiri::XML(sentence_xml)
array = []
nok_sent.root.element_children.each {|child| array << "<span class='" + child.name + "'>"
array
# => ["<span class='det'>A</span>",
# "<span class='nn'>fleet</span>",
# "<span class='in'>of</span>",
# "<span class='nns'>warships</span>",
# "<span class='stop'>.</span>"]
array.join
# => "<span class='det'>A</span><span class='nn'>fleet</span><span class='in'>of</span><span class='nns'>warships</span><span class='stop'>.</span>"
Upvotes: 2
Views: 71
Reputation: 37409
You should use children
instead of element_children
:
array = []
nok_sent.root.children.each {|child| array << "<span class='" + child.name + "'>" +child.text+ "</span>" }
array
# => ["<span class='det'>A</span>", "<span class='text'> </span>", "<span class='nn'>fleet</span>", "<span class='text'> </span>", "<span class='in'>of</span>", "<span class='text'> </span>", "<span class='nns'>warships</span>", "<span class='stop'>.</span>"]
array.join
# => "<span class='det'>A</span><span class='text'> </span><span class='nn'>fleet</span><span class='text'> </span><span class='in'>of</span><span class='text'> </span><span class='nns'>warships</span><span class='stop'>.</span>"
Upvotes: 3