henrik242
henrik242

Reputation: 386

URL is altered after parsing with Nokogiri

str = "<?xml version='1.0' encoding='utf-8'?><url>https://somehost?p1=v1&p2=v2</url>"
=> "<?xml version='1.0' encoding='utf-8'?><url>https://somehost?p1=v1&p2=v2</url>"

x = Nokogiri::XML(str)
=> #<Nokogiri::XML::Document:0x3fcaa893b900 name="document" children=[#<Nokogiri::XML::Element:0x3fcaa893b644 name="url" children=[#<Nokogiri::XML::Text:0x3fcaa893b48c "https://somehost?p1=v1=v2">]>]>

Why is '&p2' removed? after parsing?

Upvotes: 0

Views: 35

Answers (1)

tadman
tadman

Reputation: 211680

This is because in an XML/HTML context & has special meaning. You must escape it:

<?xml version='1.0' encoding='utf-8'?><url>https://somehost?p1=v1&amp;p2=v2</url>

It's parsing that as an entity &p2, which it isn't a valid entity, so it gets deleted, leaving you with p1=v1=p2

Upvotes: 3

Related Questions