Reputation: 177
I have some text that could look something like this:
Name is William Bob Francis Ford Coppola-Mr-Cool King-Of-The-Mountain is a fake name.
I would like to run a regular expression against that string and pull out
William Bob Francis Ford Coppola-Mr-Cool King-Of-The-Mountain
as a match.
My current regex looks like this:
/\b((NAME\s\s*)(((\s*\,*\s*)? *)(([A-Z\'\-])([A-Za-z\'\-]+)*\s*){2,})?)\b/ig
and it does most of what I want but it's not perfect. Instead of just getting the name, it is also getting the "is a" following the name like this:
"William Bob Francis Ford Coppola-Mr-Cool King-Of-The-Mountain is a"
What is a regex formula to get only the words starting with a capital letter following the "Name" label and end when the next word starts with a lowercase after a space?
Upvotes: 5
Views: 2534
Reputation: 27723
My guess is that, this simple expression might work, if we always have is
after our desired output:
Name is (.+?) is.+
use strict;
my $str = 'Name is William Bob Francis Ford Coppola-Mr-Cool King-Of-The-Mountain is a fake name.
';
my $regex = qr/Name is (.+?) is.+/mp;
if ( $str =~ /$regex/g ) {
print "Whole match is ${^MATCH} and its start/end positions can be obtained via \$-[0] and \$+[0]\n";
# print "Capture Group 1 is $1 and its start/end positions can be obtained via \$-[1] and \$+[1]\n";
# print "Capture Group 2 is $2 ... and so on\n";
}
# ${^POSTMATCH} and ${^PREMATCH} are also available with the use of '/p'
# Named capture groups can be called via $+{name}
jex.im visualizes regular expressions:
zdim advises that:
Perhaps, as it may not be "is", just any low-case word (so after a word boundary), something like
/\b([A-Z].+?)\b[a-z.!?]/
... (probably needs tweaking, specially for the possible end of sentence after the name) ?
Upvotes: 1
Reputation: 12438
You can use:
Name\b[\sa-z]*\K(?:[A-Z][a-z]+[\s-]*)+(?=\s[a-z])
where
\K
resets the starting point of the matching after having matched Name
followed by some words in lower case(?:[A-Z][a-z]+[\s-]*)+
will match all the words starting with a capital letter(?=\s[a-z])
add the constraint that the following word starts with a lower case letterdemo: https://regex101.com/r/WBrdFU/1/
Notes:
you shouldn't use the
i
option in your regex, if you do so all of your char classes[A-Z]
will at the same time match upper case letters but also lower case letters... This would prevent you from selecting the words that start with a capital letter!!!
Adding the names with apostrophe:
Name\b[\sa-z]*\K(?:[A-Z][a-z'\s-]*?)+(?=\s[a-z])
demo: https://regex101.com/r/WBrdFU/3/
Upvotes: 1
Reputation: 1794
This worked when I tested with regex101.com. Please check and let me know if this works for you
/Name is (([\s]*[A-Z][-a-z]*)*)/
Group 1 has this William Bob Francis Ford Coppola-Mr-Cool King-Of-The-Mountain
and test it on this link below
https://regex101.com/r/M2V2in/2
Upvotes: 0
Reputation: 1186
How do you like /Name ((?:[A-Z]\w+[ -]?)+)/
?
Regex101: https://regex101.com/r/BFJBpZ/1
Upvotes: 4