Reputation: 2633
So, the requirement for this is to match last names of people, separated by a dash between each last name.
The base RegEx I am using for this is this one:
(?=\S*[-])([a-zA-ZÑñÁáÉéÍíÓóÚúÄäËëÏïÖöÜüÀàÈèÌìÒòÙù'-]+)
Basically I am limiting it to latin alphabet characters, including some accented characters.
This works perfectly fine if I use examples like:
But I forgot to contemplate the case when the person has only one last name.
I tried doing the following.
((?=\S*[-])([\ a-zA-ZÑñÁáÉéÍíÓóÚúÄäËëÏïÖöÜüÀàÈèÌìÒòÙù'-]+))|([A-Za-zÑñÁáÉéÍíÓóÚúÄäËëÏïÖöÜüÀàÈèÌìÒòÙù']+)
I added a \
or space in the allowed character for the fist match option. I added an or condition for a single word without spaces.
And while it works for some cases there are 2 issues.
Regarding point 2, I refer to something like:
The RegEx matches it, but it no longer respects the dash as a separator.
I am not sure how to handle this.
Also since I added the space it no longer respects the requirement for the dash between words.
What I am thinking is maybe limit the number of spaces between names, something like allow at most 2 or 3 spaces between a last name so that examples like:
Can be valid matches.
I am no pro on RegEx so some help would be greatly appreciated.
UPDATE
I did fail to mention I need to be able to use this with JavaScript. PHP could be useful too, but I am doing some browser validation and the patterns need to be compatible.
Upvotes: 3
Views: 1275
Reputation: 47764
Logically, you should match one or more letters, then allow a single occurrence of your chosen delimiting characters before allowing another string of one or more letters.
PHP Code: (Demo)
$names = [
'Pérez-González',
'Domínguez-Díaz',
'Güemez-Martínez',
'Johnson-De Sosa',
'Pérez-De la Cruz',
'smith',
'Pérez De la Cruz-González',
'de Gal-O\'Connell',
'Johnson--Johnson'
];
foreach ($names as $name) {
echo "$name is " . (!preg_match("~^\pL+(?:[- ']\pL+)*$~u", $name) ? 'in' : '') . "valid\n";
}
Javascript Code: (snippet is runnable)
let names = [
'Pérez-González',
'Domínguez-Díaz',
'Güemez-Martínez',
'Johnson-De Sosa',
'Pérez-De la Cruz',
'smith',
'Pérez De la Cruz-González',
'de Gal-O\'Connell',
'Johnson--Johnson'
],
i,
name;
for (i in names) {
name = names[i];
document.write("<div>" + name + " is " + (!name.match(/^\p{L}+(?:[- ']\p{L}+)*$/u) ? 'in' : '') + "valid</div>");
}
This will only allow a single delimiter between sequences of letters. This will fail if you someone's name is "Suzy 'Ng" because it has a space then an apostrophe (two consecutive delimiters). I don't know if this is possible/real, I just want to clarify.
No lookarounds are necessary.
Upvotes: 1