Reputation: 651
I want to parse out a part of URL using regex operation. This might be old question. But I am new to regex and searched so much for my requirement and not able to find it. I know ParseURL can be used here. But my URLs are not properly structured to use that. Suppose my URL is as follows,
url = https://www.sitename.com/&q=To+Be+Parsed+out&oq=Dont+Need+to+be+parsed
Here I want to find out when &q= occurs and parse out until & occurs next. I want to remove + or any special characters in the middle. The output should be,
To Be Parsed out
Also if there is no match, the original URL should be returned.
I have tried the following,
re.search('q=?([^&]+)&',url).group(0)
this returns,
&q=To+Be+Parsed+out&oq=Dont+Need+to+be+parsed
Can anybody help me in parsing this out. Thanks
Upvotes: 0
Views: 583
Reputation: 42017
You can use re.search()
to get the desired substring and then replace all +
with spaces with str.replace()
:
re.search(r'/&q=([^&]*)', url).group(1).replace('+', ' ')
re.search(r'/&q=([^&]*)', url).group(1)
gets the desired portion and replace('+', ' ')
does the replaementsExample:
In [56]: url
Out[56]: 'https://www.sitename.com/&q=To+Be+Parsed+out&oq=Dont+Need+to+be+parsed'
In [57]: re.search(r'/&q=([^&]*)', url).group(1).replace('+', ' ')
Out[57]: 'To Be Parsed out'
In case when there is no match, catch the AttributeError
exception raised by re.search.group()
e.g.:
try:
out = re.search(r'/&q=([^&]*)', url).group(1).replace('+', ' ')
except AttributeError:
## No match, do what you want
Upvotes: 3