Reputation: 13

python re.sub not replacing all the occurance of string

I'm not getting the desire output, re.sub is only replacing the last occurance using python regular expression, please explain me what i"m doing wrong

srr = "http://www.google.com/#image-1CCCC| http://www.google.com/#image-1VVDD| http://www.google.com/#image-123|  http://www.google.com/#image-123| http://www.google.com/#image-1CE005XG03"
re.sub("http://.*[#]", "", srr)
'image-1CE005XG03'

Desire output without http://www.google.com/#image from the above string.

image-1CCCC|image-1VVDD|image-123|image-1CE005XG03

Upvotes: 0

Answers (4)

anubhava

Reputation: 785058

Using correct regex in re.sub as suggested in comment above:

import re

srr = "http://www.google.com/#image-1CCCC| http://www.google.com/#image-1VVDD| http://www.google.com/#image-123|  http://www.google.com/#image-123| http://www.google.com/#image-1CE005XG03"
print (re.sub(r"\s*https?://[^#\s]*#", "", srr))

Output:

image-1CCCC|image-1VVDD|image-123|image-123|image-1CE005XG03

RegEx Details:

\s*: Match 0 or more whitespaces
https?: Match http or https
://: Match ://
[^#\s]*: Match 0 or more of any characters that are not # and whitespace
#: Match a #

Upvotes: 0

Corralien

Reputation: 120399

>>> "|".join(re.findall(r'#([^|\s]+)', srr))
'image-1CCCC|image-1VVDD|image-123|image-123|image-1CE005XG03'

Upvotes: 1

sushanth

Reputation: 8302

Here is another solution,

"|".join(i.split("#")[-1] for i in srr.split("|"))

image-1CCCC|image-1VVDD|image-123|image-123|image-1CE005XG03

Upvotes: 0

Tim Biegeleisen

Reputation: 521073

I would use re.findall here, rather than trying to do a replacement to remove the portions you don't want:

src = "http://www.google.com/#image-1CCCC| http://www.google.com/#image-1VVDD| http://www.google.com/#image-123|  http://www.google.com/#image-123| http://www.google.com/#image-1CE005XG03"
matches = re.findall(r'https?://www\.\S+#([^|\s]+)', src)
output = '|'.join(matches)
print(output)  # image-1CCCC|image-1VVDD|image-123|image-123|image-1CE005XG03

Note that if you want to be more specific and match only Google URLs, you may use the following pattern instead:

https?://www\.google\.\S+#([^|\s]+)

Upvotes: 1

python re.sub not replacing all the occurance of string

Answers (4)

Related Questions