Reputation: 2259
I am trying to read a XML file from a third party with Nokogiri in my rails project.
One of the nodes I have ot parse contains an URL with unescaped ampersands (like foo.com/index.html?page=1&query=bar
)
I understand that this is considered malformed XML, and Nokogiri just tries to parse it anyway, resulting in foo.com/index.html?page=1=bar
.
How can I obtain the full URL? Can I tweak Nokogiri? Would you do a search&replace-prerun or what would be the best practice?
Upvotes: 2
Views: 980
Reputation: 468
Had the same issue parsing SVGs with image links containing ampersands.
Parsing SVGs as HTML seems to correctly handle the links, escaping &.
fixed_svg = Nokogiri::HTML.fragment(raw_svg).to_html
# proceed with XML parsing
svg = Nokogiri::XML(fixed_svg)
Upvotes: 2