Reputation: 13
I'm not getting the desire output, re.sub is only replacing the last occurance using python regular expression, please explain me what i"m doing wrong
srr = "http://www.google.com/#image-1CCCC| http://www.google.com/#image-1VVDD| http://www.google.com/#image-123| http://www.google.com/#image-123| http://www.google.com/#image-1CE005XG03"
re.sub("http://.*[#]", "", srr)
'image-1CE005XG03'
Desire output without http://www.google.com/#image from the above string.
image-1CCCC|image-1VVDD|image-123|image-1CE005XG03
Upvotes: 0
Views: 980
Reputation: 785058
Using correct regex in re.sub
as suggested in comment above:
import re
srr = "http://www.google.com/#image-1CCCC| http://www.google.com/#image-1VVDD| http://www.google.com/#image-123| http://www.google.com/#image-123| http://www.google.com/#image-1CE005XG03"
print (re.sub(r"\s*https?://[^#\s]*#", "", srr))
Output:
image-1CCCC|image-1VVDD|image-123|image-123|image-1CE005XG03
RegEx Details:
\s*
: Match 0 or more whitespaceshttps?
: Match http
or https
://
: Match ://
[^#\s]*
: Match 0 or more of any characters that are not #
and whitespace#
: Match a #
Upvotes: 0
Reputation: 120399
>>> "|".join(re.findall(r'#([^|\s]+)', srr))
'image-1CCCC|image-1VVDD|image-123|image-123|image-1CE005XG03'
Upvotes: 1
Reputation: 8302
Here is another solution,
"|".join(i.split("#")[-1] for i in srr.split("|"))
image-1CCCC|image-1VVDD|image-123|image-123|image-1CE005XG03
Upvotes: 0
Reputation: 521073
I would use re.findall
here, rather than trying to do a replacement to remove the portions you don't want:
src = "http://www.google.com/#image-1CCCC| http://www.google.com/#image-1VVDD| http://www.google.com/#image-123| http://www.google.com/#image-123| http://www.google.com/#image-1CE005XG03"
matches = re.findall(r'https?://www\.\S+#([^|\s]+)', src)
output = '|'.join(matches)
print(output) # image-1CCCC|image-1VVDD|image-123|image-123|image-1CE005XG03
Note that if you want to be more specific and match only Google URLs, you may use the following pattern instead:
https?://www\.google\.\S+#([^|\s]+)
Upvotes: 1