Reputation: 13
I'm trying to parse this HTML with Nokogiri:
<div class="times">
<span style="color:"><span style="padding:0 ">‎</span><!-- -->16:45‎</span>
<span style="color:"><span style="padding:0 "> ‎</span><!-- -->19:30‎</span>
<span style="color:"><span style="padding:0 "> ‎</span><!-- -->22:10‎</span>
</div>
I only want to get the times, inserted in an array.
I set up a gsub like this:
block.css('div.times span').text.gsub(" ","").gsub(" ","")
But then I end up with a single string and I'm kind of stuck. Is there an efficient way to do this?
Upvotes: 1
Views: 472
Reputation: 79723
One thing you could do is to leave the whitespace in the string, and then use String#split
to convert it to an array:
block.css('div.times span').text.gsub(" ","").split(' ')
In this case you might need to strip out the left-to-right markers as well, and I don’t think you need to replace the non-breaking spaces, so you could try this:
block.css('div.times span').text.gsub("\u200e", '').split(' ')
(\u200e
is the left-to-right marker).
An alternative with Nokogiri is to use xpath instead of CSS, which will enable you to select just the text nodes you want directly, then use map
to convert to an array of strings:
block.xpath('//div[@class="times"]/span/text()').map(&:text)
Upvotes: 1
Reputation: 54984
Easiest is probably:
block.at('div.times').text.scan /\d{2}:\d{2}/
Upvotes: 2