Reputation: 227
I need to parse a local HTML file using Nokogiri, but the HTML doesn't have any <div>
s with classes. It starts with text.
This is the HTML:
high prices in <a href="Example 1">Example 1</a><br>
low prices in <a href="Example 2">Example 2</a><br>
In this case I just need to get "high" and "low", and "example 1", and "example 2".
How can I get the text, with no elements? From the tutorials I saw, it needs some <div class= ...>
to get the text.
doc.xpath('//a/@href').each do |node| #get performance indicators
link = node.text
@test << Entry2.new(link)
end
@title = doc.xpath('//p').text.scan(/^(high|low)/)
My view:
<% @test.each do |entry| %>
<p> <%= entry.link %></p>
<% end %>
<% @title.each do |f| %>
<p> <%= f %></p>
<% end %>
And the output is like this:
Example 1Example 2
[["high"], ["low"]]
It's listing all at the same time instead of one by one. How can I change my Nokogiri code to look like this in the output?
high prices in Example 1
low prices in Example 2
Upvotes: 1
Views: 751
Reputation: 27961
Well, Nokogiri will wrap that string in an implicit <html><body><p>...
so the text will be in a single <p>
So yes, you will be able to get the links in a structured form with:
doc.xpath "//a"
The "high" and "low" strings will be in a single blob of text. You will probably need to pull them out with some regex which will depend a lot on your requirements and data, but here's the regex for what you're showing and asking for:
doc.xpath('//p').text.scan(/^(high|low)/)
I can't be sure how helpful that will specifically be with your actual requirements, but hopefully it gives you a direction to take.
Upvotes: 3