Reputation: 1055
I need a function that recognize all urls inside it and get it for manipulate, then recreate original string with urls modified.
tried:
old_msg = 'This is an url https://ebay.to/3bxNNfj e this another one https://amzn.to/2QBsX7t'
def manipulate_url(url):
#example of manipulation, in real i get query replacement tags and other complex....
if 'ebay' in url:
new_url = url + "/another/path/"
if 'amzn' in url:
new_url = url + "/lalala/path/"
return new_url
result = re.sub('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', manipulate_url, old_msg)
print(result)
#expected result based on my exmple:
#This is an url https://ebay.to/3bxNNfj/another/path/ e this another one https://amzn.to/2QBsX7t/lalala/path/
but i get : TypeError: sequence item 1: expected str instance, re.Match found
Upvotes: 1
Views: 74
Reputation: 13413
Like the docs for re.sub
says, the function you supply will receive a match object
.
to get the URL (the full match), use .group(0)
on it, like this:
import re
old_msg = 'This is an url https://ebay.to/3bxNNfj e this another one https://amzn.to/2QBsX7t'
def manipulate_url(match):
url = match.group(0)
#example of manipulation, in real i get query replacement tags and other complex....
if 'ebay' in url:
new_url = url + "/another/path/"
if 'amzn' in url:
new_url = url + "/lalala/path/"
return new_url
result = re.sub('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', manipulate_url, old_msg)
print(result)
Output:
This is an url https://ebay.to/3bxNNfj/another/path/ e this another one https://amzn.to/2QBsX7t/lalala/path/
Upvotes: 2