Reputation: 6209
I have a regex that, given the full name, is supposed to capture the first and last name. It should exclude the suffix, like "Jr.":
(.+)\s(.+(?!\sJr\.))
But this regex applied against the string Larry Farry Barry Jones Jr.
gives the match:
1. Larry Farry Barry Jones
2. Jr.
Why is my negative lookahead failing to ignore the "Jr." when parsing the full name? I want match #2 to contain "Jones".
Upvotes: 1
Views: 5226
Reputation: 92976
The reason is that your string is matched by your .+
till the end and then does the regex lookahead, there is no "Jr." following (because we are already at the end) ==> perfect, we match!!!
But that is because your pattern is wrong. Better would be this:
\S+(?:\s(?!Jr\.)\S+)*
See it here on Regexr
Means:
\S+
match a series of at least one non whitespace character.
(?:\s(?!Jr\.)\S+)*
Non capturing group: Match a whitespace and then, if it is not "Jr.", match the next series of non whitespace characters. This complete group can be repeated 0 or more times.
Upvotes: 1
Reputation: 20514
As a comment mentions it is the first .*
that matches most of the string. The use of look ahead seems in correct here, as you do not want to return that value and do not need it to be included in a further match.
The following will split all words up but not return the 'Jr.' So you could take the first and last result.
(\w+\s)+?(?!\sJr\.)
I recommend Rubular for practicing Ruby RegExp.
Upvotes: 1
Reputation: 168091
Rather than trying to do it with a single regex, I think the following would be a more maintainable code.
full_name = "Larry Farry Barry Jones Jr."
name_parts = full_name.split - ["Jr."]
first_name, last_name = name_parts[0], name_parts[-1]
Upvotes: 2