lxml parse url ampersand issue

Question

I have an url which looks something like this:

url = 'http://localhost:8080/?q=abc%26def&other_params=here'

Accessing this url in a browser an xml will be returned.

I am trying to parse the response of that url through lxml:

tree = etree.parse(url)

The problem here is that etree encodes the percent char and the url will be

url = 'http://localhost:8080/?q=abc%2526def&other_params=here'

If I dont encode the value of my q parameter, the whole url gets messed up:

url = 'http://localhost:8080/?q=abc&def&other_params=here'

Is there any way I can tell lxml not to enocde the chars in that url before sending out the request?

Martijn Pieters · Accepted Answer

I'd say that's a bug in lxml's URL handling, you should check for existing reports in the lxml tracker, and report it if it isn't there yet.

The work-around for now is to use urllib2 to retrieve your URL instead:

import urllib2

resp = urllib2.urlopen(url)
tree = etree.parse(resp)

Answers (1)