SERPY
SERPY

Reputation: 59

Python: How to only URL Encode a specific URL Parameter?

I have some big URLs that contain a lot of URL parameters.

For my specific case, I need to URL Encode the content of one specific URL Parameter (q) when the content after the "q=" starts with a slash ("/")

Example URL:

https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"

How can I only URL encode that last part of the URL which is within the "q" parameter?

The output of this example should be:

https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22%20

I already tried some different things with urllib.parse but it doesnt work the way I want it.

Thanks for your help!

Upvotes: 0

Views: 2110

Answers (2)

Edo Akse
Edo Akse

Reputation: 4401

split the string on the &q=/ part and only encode the last string

from urllib import parse

url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"'
encoded = parse.quote_plus(url.split("&q=/")[1])
encoded_url = f"{url.split('&q=/')[0]}&q=/{encoded}"
print(encoded_url)

output

https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22

Note that there's a difference between this and the requested output, but you have an url encoded space (%20) at the end


EDIT

Comment shows a different need for the encoding, so the code needs to change a bit. The code below only encodes the part after &q=. Basically, first split the url and the parameters, then iterate through the parameters to find the q= parameter, and encode that part. Do some f-string and join magic and you get an url that has the q parameter encoded. Note that this might have issues if an & is present in the part that needs to be encoded.

url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"&utm_source=test1&cpc=123&gclid=abc123'
# the first parameter is always delimited by a ?
baseurl, parameters = url.split("?")
newparameters = []
for parameter in parameters.split("&"):
    # check if the parameter is the part that needs to be encoded
    if parameter.startswith("q="):
        # encode the parameter
        newparameters.append(f"q={parse.quote_plus(parameter[2:])}")
    else:
        # otherwise add the parameter unencoded
        newparameters.append(parameter)
# string magic to create the encoded url
encoded_url = f"{baseurl}?{'&'.join(newparameters)}"
print(encoded_url)

output

https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22&utm_source=test1&cpc=123&gclid=abc123

EDIT 2

Trying to solve the edge case where there's a & character in the string to be encoded, as this messes up the string.split("&").
I tried using urllib.parse.parse_qs() but this has the same issue with the & character. Docs for reference.

This question is a nice example of how edge cases can mess up simple logic and make it overly complicated.

The RFC3986 also didn't specify any limitations on the name of the query string, otherwise that could've been used to narrow down possible errors even more.

updated code

from urllib import parse


url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/&"TE&eeST"&utm_source=test1&cpc=123&gclid=abc123'
# the first parameter is always delimited by a ?
baseurl, parameters = url.split("?")

# addition to handle & in the querystring.
# it reduces errors, but it can still mess up if there's a = in the part to be encoded.
split_parameters = []
for index, parameter in enumerate(parameters.split("&")):
    if "=" not in parameter:
        # add this part to the previous entry in split_parameters
        split_parameters[-1] += f"&{parameter}"
    else:
        split_parameters.append(parameter)


newparameters = []
for parameter in split_parameters:
    # check if the parameter is the part that needs to be encoded
    if parameter.startswith("q="):
        # encode the parameter
        newparameters.append(f"q={parse.quote_plus(parameter[2:])}")
    else:
        # otherwise add the parameter unencoded
        newparameters.append(parameter)
# string magic to create the encoded url
encoded_url = f"{baseurl}?{'&'.join(newparameters)}"
print(encoded_url)

output

https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%26%22TE%26eeST%22&utm_source=test1&cpc=123&gclid=abc123

Upvotes: 1

RufusVS
RufusVS

Reputation: 4127

@EdoAkse has a good answer, and should get the credit for the answer.

But the purist in me would do the same thing slightly differently, because

(1) I don't like doing the same function on the same data twice (for efficiency), and

(2) I like the logical symmetry of using the join function to reverse a split.

My code would look more like this:

from urllib import parse

url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"'
splitter = "&q=/"   
unencoded,encoded = url.split(splitter)
encoded_url = splitter.join(unencoded,parse.quote_plus(encoded))
print(encoded_url)  

Edit: I couldn't resist posting my edited answer based on the commentary. You can see the virtual identical code developed independently. This must be the right approach then, I guess.

from urllib import parse
url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"'
base_url,arglist = url.split("?",1)
args = arglist.split("&")
new_args = []
for arg in args:
    if arg.lower().startswith("q="):
        new_args.append(arg[:2]+parse.quote_plus(arg[2:]))
    else:
        new_args.append(arg)
encoded_url = "?".join([base_url,"&".join(new_args)])
print(encoded_url) 

Upvotes: 1

Related Questions