TDB
TDB

Reputation: 417

How to select only table rows with specific content inside

I'm scraping an email that has many table rows, some of which I want to exclude. The table rows I do need look exactly like:

<tr>
  <td class="quantity"> ANYTHING BUT EMPTY </td>
  <td class="description"> ANYTHING BUT EMPTY </td>
  <td class="price"> ANYTHING BUT EMPTY </td>
</tr>

None of the table rows have a class or id. Moreover, there are unwanted <table> rows that contain cells with these classes but some with no values, so I need to get only table rows that have these three classes of cells, and all three cells with non-empty values. I'm not sure of the syntax to do this:

body = Nokogiri::HTML(email)
wanted_rows = body.css('tr').select{ NOT SURE HOW TO ENCAPSULATE LOGIC HERE }

Upvotes: 0

Views: 124

Answers (1)

matt
matt

Reputation: 79743

This is fairly straightforward with XPath:

wanted_rows = body.xpath('//tr[td[(@class = "quantity") and normalize-space()]
  and td[(@class = "description") and normalize-space()]
  and td[(@class = "price") and normalize-space()]]')

The normalize-space() calls are effectively the same as normalize-space(.) != "", i.e. they check that the current node (the td) contains something other than just whitespace.

Upvotes: 1

Related Questions