Somnath Paul
Somnath Paul

Reputation: 190

Python Regular expression - Substitution

I have written a python code :

import re

url = "www.google.com";
line = "../../asyouwish.html"

num = re.sub(r'(\.\.\/)*', url, line)
print ("Final : ", num)

My intention is to replace ../ (any number of times) with the url value provided. However I am not getting correct output. My desired output is "www.google.com/asyouwish.html".

What I get is :

Final :  www.google.comawww.google.comswww.google.comywww.google.comowww.google.
comuwww.google.comwwww.google.comiwww.google.comswww.google.comhwww.google.com.w
ww.google.comhwww.google.comtwww.google.commwww.google.comlwww.google.com

Can anyone help me as where I went wrong !!! Thanks.

Upvotes: 1

Views: 5539

Answers (2)

unutbu
unutbu

Reputation: 879291

* means 0-or-more occurrences. + means 1-or-more. You want a match to have at least 1 occurrence of ../. So change the * to +:

import re

url = "www.google.com/"
line = "../../asyouwish.html"

num = re.sub(r'([.]{2}/)+', url, line)
print ("Final : ", num)

yields

('Final : ', 'www.google.com/asyouwish.html')

Since the re.sub will remove 1-or-more '../', you'll need to add a forward-slash after url. Above, I've added the forward-slash to url itself. If url comes without the forward-slash, you can (as an alternative) add it with

num = re.sub(r'([.]{2}/)+', url+'/', line)

When you match on 0-or-more occurrences, r'([.]{2}/)*', each and every location between the characters in line matches the pattern, so you get a substitution at each interstice.

In [9]: x = 'www.google.comawww.google.comswww.google.comywww.google.comowww.google.comuwww.google.comwwww.google.comiwww.google.comswww.google.comhwww.google.com.www.google.comhwww.google.comtwww.google.commwww.google.comlwww.google.com'

In [13]: x.split('www.google.com')
Out[13]: ['', 'a', 's', 'y', 'o', 'u', 'w', 'i', 's', 'h', '.', 'h', 't', 'm', 'l', '']

Upvotes: 5

scottydelta
scottydelta

Reputation: 1806

use something like

url = "www.google.com";
line = "../../asyouwish.html"
link_part = line.split("/")

final_url = url + "/" + link_part[-1]

Upvotes: 0

Related Questions