Giuseppe Lodi Rizzini
Giuseppe Lodi Rizzini

Reputation: 1055

Manipulate all url inside a string and return new string modified in python

I need a function that recognize all urls inside it and get it for manipulate, then recreate original string with urls modified.

tried:

old_msg = 'This is an url https://ebay.to/3bxNNfj e this another one https://amzn.to/2QBsX7t'

def manipulate_url(url):
    #example of manipulation, in real i get query replacement tags and other complex....
    if 'ebay' in url:
        new_url = url + "/another/path/"
    if 'amzn' in url:
        new_url = url + "/lalala/path/"
    return new_url

result = re.sub('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', manipulate_url, old_msg)
print(result)

#expected result based on my exmple:
#This is an url https://ebay.to/3bxNNfj/another/path/ e this another one https://amzn.to/2QBsX7t/lalala/path/

but i get : TypeError: sequence item 1: expected str instance, re.Match found

Upvotes: 1

Views: 74

Answers (1)

Adam.Er8
Adam.Er8

Reputation: 13413

Like the docs for re.sub says, the function you supply will receive a match object.

to get the URL (the full match), use .group(0) on it, like this:

import re

old_msg = 'This is an url https://ebay.to/3bxNNfj e this another one https://amzn.to/2QBsX7t'

def manipulate_url(match):
    url = match.group(0)
    #example of manipulation, in real i get query replacement tags and other complex....
    if 'ebay' in url:
        new_url = url + "/another/path/"
    if 'amzn' in url:
        new_url = url + "/lalala/path/"
    return new_url

result = re.sub('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', manipulate_url, old_msg)
print(result)

Output:

This is an url https://ebay.to/3bxNNfj/another/path/ e this another one https://amzn.to/2QBsX7t/lalala/path/

Upvotes: 2

Related Questions