Reputation: 3
I can't use XPath because the encoding gets weird. I hoped you could help me out of this trouble.
require "Nokogiri"
require "open-uri"
link = "http://www.arla.dk/Services/SearchService.asmx/RecipeResult?q=allRecipe&paging=6&include=&exclude=&area=recipeSearch&languageBranch=da"
doc = Nokogiri::HTML(open(link))
doc.xpath("//h2")
The xpath
method returns an empty array. It looks like the document has not been parsed correct. I think it is due to the file being parsed contains the encoded characters:
<strong>Frokost til 8</strong>
<ul><li class='ingHeading'><strong><b>Flade
Upvotes: 0
Views: 841
Reputation: 34156
As stated above, the issue is that the HTML is encoded, which is why you are seeing escape sequences; For example, <
instead of <
. To get around it, unescape the HTML.
"How do I encode/decode HTML entities in Ruby? basically suggests using htmlentities.
Upvotes: 0
Reputation: 54984
The response is XML so first parse it with Nokogiri::XML:
xml = Nokogiri::XML open(link)
then the first string contains some HTML so parse that with Nokogiri::HTML
doc = Nokogiri::HTML xml.at('string').text
Now you can do your search:
doc.xpath '//h2'
Upvotes: 1