Reputation: 1081
I was wondering if there are known workarounds to some odd behavior I'm seeing with python's urlparse.
Here are some results from a couple of lines in the python interpeter:
>>> import urlparse
>>> urlparse.parse_qsl('https://localhost/?code=bork&charlie=brown')
[('https://localhost/?code', 'bork'), ('charlie', 'brown')]
In the above example, why is the key for the first value 'https://localhost/?code'? Shouldn't it just be 'code'? Note: parse_qs has the same bad behavior.
>>> urlparse.urlparse('abcd://location/?code=bork&charlie=brown')
ParseResult(scheme='abcd', netloc='location', path='/?code=bork&charlie=brown', params='', query='', fragment='')
>>> urlparse.urlparse('https://location/?code=bork&charlie=brown')
ParseResult(scheme='https', netloc='location', path='/', params='', query='code=bork&charlie=brown', fragment='')
In the above example note that the query string doesn't always get put into the query value. Why does the protocol matter at all? Shouldn't the query field always get the query string? Testing with 'ftp' or other well known protocols seems to also be unhappy.
Upvotes: 1
Views: 1008
Reputation: 23856
urlparse.parse_qsl
(and urlparse.parse_qs
) are methods intended for the query part of the request (the string after the ?
).
Maybe you want to use a method that understands whole URLs first (urlparse.urlparse
), and then pass the query from the result to urlparse_qsl
:
>>> import urlparse
>>> myurl = urlparse.urlparse('https://localhost/?code=bork&charlie=brown')
>>> print myurl
ParseResult(scheme='https', netloc='localhost', path='/', params='', query='code=bork&charlie=brown', fragment='')
>>> print myurl.scheme
https
>>> print urlparse.parse_qs(myurl.query)
{'charlie': ['brown'], 'code': ['bork']}
The scheme matters, because although the query exists in the generic syntax, some protocols may not support them.
See also:
http://en.wikipedia.org/wiki/URI_scheme (check out the official registered schemes)
Upvotes: 3
Reputation: 4056
The documentation for urlparse.parse_qs
(and parse_qsl
) do state that it's meant to "Parse a query string given as a string argument." You're not giving it a query string, you're giving it the whole URL. Try this instead:
>>> urlparse.parse_qsl('code=bork&charlie=brown')
[('code', 'bork'), ('charlie', 'brown')]
Upvotes: 0