Reputation: 10030
I'm struggling to come up with a RegEx will confirm that some text exists between two tags. Specifically, I want to ensure that that the text "TOTAL" and "$19.00" can be found within the same table row.
I'm not planing to nest tables, so I'm not worried about a nested match, but I do want to make sure that my text is within the SAME tr
<tr style='text-align:right;'>
<td>shipping:</td>
<td style='padding-left:3em;'>$17.00</td>
</tr>
<tr style='text-align:right;'>
<td>TOTAL:</td>
<td style='padding-left:3em;'>$19.00</td>
</tr>
/<tr\b[^>]*>(.*?)<\/tr>/m
It's close, the second capture group has my content. What do I need to change so only the second capture group is matched?
You can play with it on Rubular here
Upvotes: 1
Views: 985
Reputation: 434665
I think an HTML parser and a bit of XPath would be a better call than a regex. Something like this:
shipping = '//td[normalize-space(text())="shipping:"]/following-sibling::td[normalize-space(text())]'
total = '//td[normalize-space(text())="TOTAL:"]/following-sibling::td[normalize-space(text())]'
doc = Nokogiri::HTML <<HTML
<tr style='text-align:right;'>
<td> shipping: </td>
<td style='padding-left:3em;'>$17.00</td>
</tr>
<tr style='text-align:right;'>
<td>TOTAL:</td>
<td style='padding-left:3em;'>$19.00</td>
</tr>
HTML
has_shipping = doc.xpath(shipping).count == 1 # true
has_total = doc.xpath(total ).count == 1 # true
But without the $17.00
and $19.00
:
doc = Nokogiri::HTML <<HTML
<tr style='text-align:right;'>
<td> shipping: </td>
<td style='padding-left:3em;'> </td>
</tr>
<tr style='text-align:right;'>
<td>TOTAL:</td>
<td style='padding-left:3em;'></td>
</tr>
HTML
has_shipping = doc.xpath(shipping).count == 1 # false
has_total = doc.xpath(total ).count == 1 # false
If you want to verify the format of the price as well then you can find just the <td>
s you want and apply whatever Enumerable methods make sense in your situation:
shipping = '//td[normalize-space(text())="shipping:"]/following-sibling::td'
good_one = doc.xpath(shipping).count { |n| n.content =~ /\A\s*\$\d+\.\d{2}\s*\z/ } == 1
Upvotes: 2