yulai
yulai

Reputation: 761

Regex in Python - Substring with single "re.sub" call

I am looking into the Regex function in Python. As part of this, I am trying to extract a substring from a string.

For instance, assume I have the string:

<place of birth="Stockholm">

Is there a way to extract Stockholm with a single regex call?

So far, I have:

location_info = "<place of birth="Stockholm">"

#Remove before
location_name1 = re.sub(r"<place of birth=\"", r"", location_info)
#location_name1 --> Stockholm">

#Remove after
location_name2 = re.sub(r"\">", r"", location_name1)
#location_name2 --> Stockholm

Any advice on how to extract the string Stockholm, without using two "re.sub" calls is highly appreciated.

Upvotes: 2

Views: 135

Answers (4)

xiyurui
xiyurui

Reputation: 231

this code tested under python 3.6

 test =  '<place of birth="Stockholm">'
 resp = re.sub(r'.*="(\w+)">',r'\1',test)
 print (resp)


 Stockholm

Upvotes: 0

Enermis
Enermis

Reputation: 697

Is there a specific reason why you are removing the rest of the string, instead of selecting the part you want with something like

location_info = "<place of birth="Stockholm">"
location_info = re.search('<.*="(.*)".*>', location_info, re.IGNORECASE).group(1)

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626826

Sure, you can match the beginning up to the double quotes, and match and capture all the characters other than double quotes after that:

import re
p = re.compile(r'<place of birth="([^"]*)')
location_info = "<place of birth=\"Stockholm\">"
match = p.search(location_info)
if match:
    print(match.group(1))

See IDEONE demo

The <place of birth=" is matched as a literal, and ([^"]*) is a capture group 1 matching 0 or more characters other than ". The value is accessed with .group(1).

Here is a REGEX demo.

Upvotes: 3

vks
vks

Reputation: 67968

print re.sub(r'^[^"]*"|"[^"]*$',"",location_info)

This should do it for you.See demo.

https://regex101.com/r/vV1wW6/30#python

Upvotes: 1

Related Questions