mpora
mpora

Reputation: 1479

How to parse TABLE text with Nokogiri?

I am using the nokogiri gem to parse an html table content in which I have a column with a list of names and some of those names are hyperlinked and some are not. When I use this code:

puts doc.xpath("//table//tr//td[1]/text()")

It skips the hyperlinked names. I can also get the hyperlinked names with this:

doc.xpath('//table//tr//td[1]//a[@href]').each do |link|
   puts link.text.strip
end

How can I get all names without having to do it twice?

Upvotes: 2

Views: 423

Answers (1)

Mark Thomas
Mark Thomas

Reputation: 37527

If you want all text in the cell, hyperlinked or not:

doc.xpath('//td[1]').each do |cell|
   puts cell.text.strip
end

Note: in a valid HTML document, a td will always be within a table and a tr. If you don't have any other selector requirements, you can simplify as above.

Upvotes: 1

Related Questions