BigBoy1337
BigBoy1337

Reputation: 4973

How to parse and then unparse a url query string so that it ends up in the same format/encoding as before?

Is there a way that I can take a url, parse it to get the query, edit the query with python, then remake the url so that its exactly the same (same format, encoding, etc). Here is what I have tried using urllib functions

>>> working_url
'https://<some-netloc>/reports/sales-order-history?page=&sort_direction=&sort_column=&filter%5Bsearch%5D=&filter%5Bofficial%5D%5B0%5D%5Bname%5D=status&filter%5Bofficial%5D%5B0%5D%5Bvalue%5D=Pending%2CProcessing%2CReady%20to%20ship%2CDelivering%2CDelivered%2CCompleted&filter%5Bofficial%5D%5B1%5D%5Bname%5D=orderDate&filter%5Bofficial%5D%5B1%5D%5Bvalue%5D=2020-05-10T07%3A00%3A00.000Z%2C2020-05-18T06%3A59%3A59.999Z'
>>> working_parse = urlparse(working_url)
>>> working_parse
ParseResult(scheme='https', netloc='<some-netloc>', path='/reports/sales-order-history', params='', query='page=&sort_direction=&sort_column=&filter%5Bsearch%5D=&filter%5Bofficial%5D%5B0%5D%5Bname%5D=status&filter%5Bofficial%5D%5B0%5D%5Bvalue%5D=Pending%2CProcessing%2CReady%20to%20ship%2CDelivering%2CDelivered%2CCompleted&filter%5Bofficial%5D%5B1%5D%5Bname%5D=orderDate&filter%5Bofficial%5D%5B1%5D%5Bvalue%5D=2020-05-10T07%3A00%3A00.000Z%2C2020-05-18T06%3A59%3A59.999Z', fragment='')
>>> working_query_dict = parse_qs(working_parse.query)

Here is where I would edit working_query_dict to change those timestamps for instance. Now I use urlencode to encode the dictionary again and urlunparse to turn it back into a real working url.

>>> working_query_dict
{'filter[official][0][name]': ['status'], 'filter[official][0][value]': ['Pending,Processing,Ready to ship,Delivering,Delivered,Completed'], 'filter[official][1][name]': ['orderDate'], 'filter[official][1][value]': ['2020-05-10T07:00:00.000Z,2020-05-18T06:59:59.999Z']}
>>> urlunparse((working_parse.scheme,working_parse.netloc,working_parse.path,working_parse.params,urlencode(working_query_dict),working_parse.fragment))
'https://<some-net-loc>/reports/sales-order-history?filter%5Bofficial%5D%5B0%5D%5Bname%5D=%5B%27status%27%5D&filter%5Bofficial%5D%5B0%5D%5Bvalue%5D=%5B%27Pending%2CProcessing%2CReady+to+ship%2CDelivering%2CDelivered%2CCompleted%27%5D&filter%5Bofficial%5D%5B1%5D%5Bname%5D=%5B%27orderDate%27%5D&filter%5Bofficial%5D%5B1%5D%5Bvalue%5D=%5B%272020-05-10T07%3A00%3A00.000Z%2C2020-05-18T06%3A59%3A59.999Z%27%5D' 

However, this url that gets formed doesn't work - it doesn't resolve to the same place on the website. Even looking at it, you can tell its changed, even though I changed no attributes or anything.

Im thinking maybe I need to like, detect the encoding or format when doing parse_qs, and then use that format when doing urlencode? How can I do this?

Upvotes: 0

Views: 2278

Answers (1)

BigBoy1337
BigBoy1337

Reputation: 4973

Ok the key is the urlencode flag quote_via=urllib.parse.quote. Additionally, parse_qs could be changed to parse_qsl in order to preserve ordering of parameters, and the keep_blank_labels=True to that function maintains even the blank parameters in the dictionary if you want an absolutely true match.

So now this works for me:

>>> from urllib.parse import quote, parse_qsl,urlencode
>>> urlencode(parse_qsl(working_parse.query,keep_blank_values=True),quote_via=quote) == working_parse.query
True

it takes a complicated query (which you could edit the attributes if you want), parses it out and urlencodes it to the original query string.

Upvotes: 2

Related Questions