Reputation: 354
I am taking an example scenario for my question. If I have a list of URLs :
url_list=["https:www.example.com/pag31/go","https:www.example.com/pag12/go","https:www.example.com/pag0/go"]
I want to replace the substring in between ".com/" and "go"
For Eg. new url should look like
['https:www.example.com/home/go','https:www.example.com/home/go','https:www.example.com/home/go']
I have tried slicing and replacing based on index but couldn't get the required result for the whole list.
Any help is really appreciated. Thanks in advance.
Upvotes: 1
Views: 61
Reputation: 5746
You can use regex
sub()
and list comprehension to apply your logic to every element of your list.
import re
url_list=["https:www.google.com/pag31/go","https:www.facebook.com/pag12/go","http:www.bing.com/pag0/go"]
pattern = r'(?<=com\/).*(?=\/go)'
result = [re.sub(pattern, 'home', url) for url in url_list]
This will match against any string where a value is found between com/
and /go
. This will also ensure that we capture any website, regardless of http(s).
Output:
['https:www.google.com/home/go', 'https:www.facebook.com/home/go', 'http:www.bing.com/home/go']
Regex Explanation
The pattern r'(?<=com\/).*(?=\/go)'
looks for the following:
(?<=com\/)
: Positive lookbehind to check if com/
prefixes our lookup
.*
: Matches anything an infinite amount of times
(?=\/go)
: Positive look ahead to check if /go
directly occurs after .*
This enables us to match any string between the positive checks. You can find a more in-depth explanation on the pattern here
Upvotes: 2
Reputation: 21
You can try using regular expressions of python.
import re
re_url ="https:www.example.com/.*/go"
url = "https:www.example.com/home/go"
url_list_new= [re.sub(re_url,url,x) for x in url_list]
url_list_new
Output:
['https:www.example.com/home/go',
'https:www.example.com/home/go',
'https:www.example.com/home/go']
Upvotes: 1