Reputation: 5279
I am trying to get the text between two tag.
<b> foo</b>bar<br/>
=> bar
I tried using '<b>asdasd</b>qwe<br/>'.scan(/<b>[a-zA-Z0-9]*<\/b>(.*)<br\/>/)
and it gives me proper result.
but when I try this :
'<b>exclude</b>op1<br/>exclude 2<b>exclude</b>op2<br/>exclude 2<b>exclude</b>op3<br/>exclude 2'.scan(/<b>[a-zA-Z0-9]*<\/b>(.*)<br\/>/) { |ele|
puts ele
}
It matches the first <b>
tag and the last <br/>
tag and returns the whole string I was expecting an array of matches
Upvotes: 0
Views: 1327
Reputation: 54984
Instead of using regex on html use nokogiri:
Nokogiri::HTML.fragment(str).css('b').each do |b|
puts b.next.text
end
Upvotes: 9
Reputation: 222040
Change (.*)
to (.*?)
to make it ungreedy
/<b>[a-zA-Z0-9]*<\/b>(.*?)<br\/>/
Test
[2] pry(main)> '<b>exclude</b>op1<br/>exclude 2<b>exclude</b>op2<br/>exclude 2<b>exclude</b>op3<br/>exclude 2'.scan(/<b>[a-zA-Z0-9]*<\/b>(.*?)<br\/>/) { |ele|
[2] pry(main)* puts ele
[2] pry(main)* }
op1
op2
op3
Upvotes: 8