Reputation: 307
Trying to write a regex that can parse a full name and split it into first name, middle name, last name. This should be easy but its pretty hard once you see the kind of names I have to parse. Now I could write a big long regex that takes into accout all these different cases but I think a smaller dynamic regex is possible and that's why I am here asking for some help.
I think these are all of the types of names I have to grab.
Some example names that need to be parsed are(each have three commas at the end):
(first name) (middle intial). (last name),,, //one middle initial with period after
(first name) (last name),,, //simple first and last
(No name),,, //no name
(first name) (last name)-(last name),,, //two last names separated by a dash
(first name) (middle initial). (middle initial). (last name),,, //two middle initials with space inbetween
(first name) (last name w/ apostrophe),,, //Last names with apostrophes
(first name) (Middle name) (Last name),,, //first middle and last name
Upvotes: 2
Views: 2458
Reputation: 39158
use 5.010;
use DDS;
for (<DATA>) {
chomp;
s/,,,.*//;
if (' ' eq $_) {
say 'no name';
} else {
/\A (?<first>\S+) \s+ (?<middle>.*?)? (?:\s+)? (?<last>\S+) \z/msx;
DumpLex \%+;
}
}
__DATA__
Foo B. Baz,,,
Fnord Quux,,,
,,,
Xyzzy Bling-Bling,,,
Abe C. D. Efg,,,
Ed O'postrophe,,,
First Middle Last,,,
$HASH1 = {
first => 'Foo',
last => 'Baz',
middle => 'B.'
};
$HASH1 = {
first => 'Fnord',
last => 'Quux',
middle => ''
};
no name
$HASH1 = {
first => 'Xyzzy',
last => 'Bling-Bling',
middle => ''
};
$HASH1 = {
first => 'Abe',
last => 'Efg',
middle => 'C. D.'
};
$HASH1 = {
first => 'Ed',
last => 'O\'postrophe',
middle => ''
};
$HASH1 = {
first => 'First',
last => 'Last',
middle => 'Middle'
};
Upvotes: 3
Reputation: 53976
You can't parse something that ultimately follows no rules and hope to have any success. The problem is not translating the algorithm to a regular expression, but writing the algorithm to begin with.
Consider: how would you write an algorithm that could properly parse all these names into Given, Middle, and Family names?
See what I mean? You'd need an AI to be able to properly chunk each of these words into the proper context. Some people use two names as their "given" name. Some people use titles or honorifics, and some cultures place their family name first and given name last.
Summary: Don't do it. If you cannot get the user to separate their name into specific chunks for you, you must treat them as atoms.
Upvotes: 4
Reputation: 54
No code, but try:
Something like that, anyway...
Upvotes: 3