Reputation: 1217
I have the following strings:
NAME John Nash FROM California
NAME John Nash
I want a regular expression capable of extracting 'John Nash' for both strings.
Here is what I tried
"NAME(.*)(?:FROM)"
"NAME(.*)(?:FROM)?"
"NAME(.*?)(?:FROM)?"
but none of these works for both strings.
Upvotes: 5
Views: 2849
Reputation: 1852
You can do without regex:
>>> myStr = "NAME John Nash FROM California"
>>> myStr.split("FROM")[0].replace("NAME","").strip()
'John Nash'
Upvotes: 0
Reputation: 98921
Make the second part of the string optional (?: FROM.*?)?
, i.e.:
NAME (.*?)(?: FROM.*?)?$
MATCH 1
1. [5-14] `John Nash`
MATCH 2
1. [37-46] `John Nash`
MATCH 3
1. [53-66] `John Doe Nash`
Regex Demo
https://regex101.com/r/bL7kI2/2
Upvotes: 2
Reputation: 107297
You can use logical OR between FROM
and anchor $
:
NAME(.*)(?:FROM|$)
See demo https://regex101.com/r/rR3gA0/1
In this case after the name it will match FROM
or the end of the string.But in your regex since you make the FROM
optional in firs case it will match the rest of string after the name.
If you want to use a more general regex you better to create your regex based on your name possibility shapes for example if you are sure that your names are create from 2 word you can use following regex :
NAME\s(\w+\s\w+)
Demo https://regex101.com/r/kV2eB9/2
Upvotes: 5
Reputation: 5658
r'^\w+\s+(\w+\s+\w+) - word at start of string
follows by one or more spaces and
two words and at least one space between them
with open('data', 'r') as f:
for line in f:
mo = re.search(r'^\w+\s+(\w+\s+\w+)',line)
if mo:
print(mo.group(1))
John Nash
John Nash
Upvotes: 0