Reputation: 2715
For finding names in a big text I have the following regex
([A-Z][a-z]*)[\s-]([A-Z][a-z]*)
This works fine for normals names like "Jack Oneill" or "John Guidetti". But there are a few possibilities that I want to find, but cannot find. Like:
Chandler Murial Bing
Gandalf the Gray
Pieter van den Woude
I cannot seem to get this right with my limited knowledge of Regular Expressions. Can anyone help me (and please provide a good website/book for this) :)
Upvotes: 3
Views: 38114
Reputation: 33207
The best way to approach a regular expression problem is to describe the matches you are looking for (usually called grammar).
For example, from your question, I might describe it like the following:
.
(an initial).If this provides a reasonably close match to the desired result set (and to be clear, for names, there are so many variations that you will either have false positives or false negatives), then you begin constructing the expression:
[A-Z]([a-z]+|\.)
[a-z][a-z\-]+
Result:
[A-Z]([a-z]+|\.)(?:\s+[A-Z]([a-z]+|\.))*(?:\s+[a-z][a-z\-]+){0,2}\s+[A-Z]([a-z]+|\.)
Matches (in bold):
Hello my name is Chandler Muriel Bing. I have a friend who is named Pieter van den Woude and he has another friend, A. A. Milne. Gandalf the Gray joins us. Together, we make up the Friends Cast and Crew.
Problems:
Upvotes: 15
Reputation: 4523
In your case, just add another
[\s-]([A-Z][a-z]*)
Generally speaking, regex is not suitable for this problem, there are too many special cases, you will need to build a list of those names.
For complex names, you may refer to [natural language processing]: http://en.wikipedia.org/wiki/Natural_language_processing
Upvotes: 1