Reputation: 761
I am looking into the Regex function in Python. As part of this, I am trying to extract a substring from a string.
For instance, assume I have the string:
<place of birth="Stockholm">
Is there a way to extract Stockholm with a single regex call?
So far, I have:
location_info = "<place of birth="Stockholm">"
#Remove before
location_name1 = re.sub(r"<place of birth=\"", r"", location_info)
#location_name1 --> Stockholm">
#Remove after
location_name2 = re.sub(r"\">", r"", location_name1)
#location_name2 --> Stockholm
Any advice on how to extract the string Stockholm, without using two "re.sub" calls is highly appreciated.
Upvotes: 2
Views: 135
Reputation: 231
this code tested under python 3.6
test = '<place of birth="Stockholm">'
resp = re.sub(r'.*="(\w+)">',r'\1',test)
print (resp)
Stockholm
Upvotes: 0
Reputation: 697
Is there a specific reason why you are removing the rest of the string, instead of selecting the part you want with something like
location_info = "<place of birth="Stockholm">"
location_info = re.search('<.*="(.*)".*>', location_info, re.IGNORECASE).group(1)
Upvotes: 0
Reputation: 626826
Sure, you can match the beginning up to the double quotes, and match and capture all the characters other than double quotes after that:
import re
p = re.compile(r'<place of birth="([^"]*)')
location_info = "<place of birth=\"Stockholm\">"
match = p.search(location_info)
if match:
print(match.group(1))
See IDEONE demo
The <place of birth="
is matched as a literal, and ([^"]*)
is a capture group 1 matching 0 or more characters other than "
. The value is accessed with .group(1)
.
Here is a REGEX demo.
Upvotes: 3
Reputation: 67968
print re.sub(r'^[^"]*"|"[^"]*$',"",location_info)
This should do it for you.See demo.
https://regex101.com/r/vV1wW6/30#python
Upvotes: 1