Reputation: 8098
How do I properly construct urls with query strings?
For example, from a website, I scrape the value www.abc.com/SomethingHere?x=1&y=2
however, the value I get uplon scraping is www.abc.com/SomethingHere?x=1&y=2
sometimes there's wierd %xx
at the end I don't understand. Requests made with these modified strings fail (but are ok if I manually remove the amp and percentage wierdness). It also makes me afraid of adding more query parameters with just www.abc.com/SomethingHere?x=1&y=2&z=3
How do I make sure I get the proper urls?
Upvotes: 0
Views: 178
Reputation: 4006
Do it in two steps:
import urllib
# first parse the url
>>> parsed = urllib.parse.urlparse('www.abc.com/SomethingHere?x=1&y=2')
>>> parsed
ParseResult(scheme='', netloc='', path='www.abc.com/SomethingHere', params='', query='x=1&y=2', fragment='')
# the parse the query string component (into a dictionary)
>>> q = parsed.query
>>> urllib.parse.parse_qs(q)
{'y': ['2'], 'x': ['1']}
Upvotes: 2
Reputation: 9413
You can have a look at urlparse
in python (here). Calling urlparse
on your query, we get something like:
urlparse('www.abc.com/SomethingHere?x=1&y=2&z=3')
Output: ParseResult(scheme='', netloc='', path='www.abc.com/SomethingHere', params='', query='x=1&y=2&z=3%%xx', fragment='')
For modifying query params you can further use urljoin
, as follows:
urljoin('www.abc.com/SomethingHere?x=1&y=2&z=3%%xx', '?x=2')
Output: 'www.abc.com/SomethingHere?x=2'
Upvotes: 0