adhoc
adhoc

Reputation: 177

How do I get all words that begin with a capital letter following a specific string?

I have some text that could look something like this:

Name is William Bob Francis Ford Coppola-Mr-Cool King-Of-The-Mountain is a fake name.

I would like to run a regular expression against that string and pull out

William Bob Francis Ford Coppola-Mr-Cool King-Of-The-Mountain 

as a match.

My current regex looks like this:

/\b((NAME\s\s*)(((\s*\,*\s*)? *)(([A-Z\'\-])([A-Za-z\'\-]+)*\s*){2,})?)\b/ig

and it does most of what I want but it's not perfect. Instead of just getting the name, it is also getting the "is a" following the name like this:

"William Bob Francis Ford Coppola-Mr-Cool King-Of-The-Mountain is a"

What is a regex formula to get only the words starting with a capital letter following the "Name" label and end when the next word starts with a lowercase after a space?

Upvotes: 5

Views: 2534

Answers (4)

Emma
Emma

Reputation: 27723

My guess is that, this simple expression might work, if we always have is after our desired output:

Name is (.+?) is.+

Test

use strict;

my $str = 'Name is William Bob Francis Ford Coppola-Mr-Cool King-Of-The-Mountain is a fake name.
';
my $regex = qr/Name is (.+?) is.+/mp;

if ( $str =~ /$regex/g ) {
  print "Whole match is ${^MATCH} and its start/end positions can be obtained via \$-[0] and \$+[0]\n";
  # print "Capture Group 1 is $1 and its start/end positions can be obtained via \$-[1] and \$+[1]\n";
  # print "Capture Group 2 is $2 ... and so on\n";
}

# ${^POSTMATCH} and ${^PREMATCH} are also available with the use of '/p'
# Named capture groups can be called via $+{name}

Demo

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Advice

zdim advises that:

Perhaps, as it may not be "is", just any low-case word (so after a word boundary), something like /\b([A-Z].+?)\b[a-z.!?]/ ... (probably needs tweaking, specially for the possible end of sentence after the name) ?

Upvotes: 1

Allan
Allan

Reputation: 12438

You can use:

Name\b[\sa-z]*\K(?:[A-Z][a-z]+[\s-]*)+(?=\s[a-z])

where

  • \K resets the starting point of the matching after having matched Name followed by some words in lower case
  • (?:[A-Z][a-z]+[\s-]*)+ will match all the words starting with a capital letter
  • (?=\s[a-z]) add the constraint that the following word starts with a lower case letter

demo: https://regex101.com/r/WBrdFU/1/

Notes:

you shouldn't use the i option in your regex, if you do so all of your char classes [A-Z] will at the same time match upper case letters but also lower case letters... This would prevent you from selecting the words that start with a capital letter!!!

Adding the names with apostrophe:

Name\b[\sa-z]*\K(?:[A-Z][a-z'\s-]*?)+(?=\s[a-z])

demo: https://regex101.com/r/WBrdFU/3/

Upvotes: 1

JBone
JBone

Reputation: 1794

This worked when I tested with regex101.com. Please check and let me know if this works for you

  /Name is (([\s]*[A-Z][-a-z]*)*)/

Group 1 has this William Bob Francis Ford Coppola-Mr-Cool King-Of-The-Mountain

and test it on this link below

https://regex101.com/r/M2V2in/2

Upvotes: 0

truth
truth

Reputation: 1186

How do you like /Name ((?:[A-Z]\w+[ -]?)+)/?

Regex101: https://regex101.com/r/BFJBpZ/1

Upvotes: 4

Related Questions