Reputation: 93
I'm looking for a good JavaScript RegEx to convert names to proper cases. For example:
John SMITH = John Smith
Mary O'SMITH = Mary O'Smith
E.t MCHYPHEN-SMITH = E.T McHyphen-Smith
John Middlename SMITH = John Middlename SMITH
Well you get the idea.
Anyone come up with a comprehensive solution?
Upvotes: 4
Views: 2618
Reputation: 289
Mark Summerfield has done a comprehensive job of this with Lingua::EN::NameCase:
KEITH Keith
LEIGH-WILLIAMS Leigh-Williams
MCCARTHY McCarthy
O'CALLAGHAN O'Callaghan
ST. JOHN St. John
VON STREIT von Streit
VAN DYKE van Dyke
AP LLWYD DAFYDD ap Llwyd Dafydd
henry viii Henry VIII
louis xiv Louis XIV
The above is written in Perl, but it makes heavy use of regular expressions, so you should be able to glean some good techniques.
Here the relevant source:
sub nc {
croak "Usage: nc [[\\]\$SCALAR]"
if scalar @_ > 1 or ( ref $_[0] and ref $_[0] ne 'SCALAR' ) ;
local( $_ ) = @_ if @_ ;
$_ = ${$_} if ref( $_ ) ; # Replace reference with value.
$_ = lc ; # Lowercase the lot.
s{ \b (\w) }{\u$1}gox ; # Uppercase first letter of every word.
s{ (\'\w) \b }{\L$1}gox ; # Lowercase 's.
# Name case Mcs and Macs - taken straight from NameParse.pm incl. comments.
# Exclude names with 1-2 letters after prefix like Mack, Macky, Mace
# Exclude names ending in a,c,i,o, or j are typically Polish or Italian
if ( /\bMac[A-Za-z]{2,}[^aciozj]\b/o or /\bMc/o ) {
s/\b(Ma?c)([A-Za-z]+)/$1\u$2/go ;
# Now correct for "Mac" exceptions
s/\bMacEvicius/Macevicius/go ; # Lithuanian
s/\bMacHado/Machado/go ; # Portuguese
s/\bMacHar/Machar/go ;
s/\bMacHin/Machin/go ;
s/\bMacHlin/Machlin/go ;
s/\bMacIas/Macias/go ;
s/\bMacIulis/Maciulis/go ;
s/\bMacKie/Mackie/go ;
s/\bMacKle/Mackle/go ;
s/\bMacKlin/Macklin/go ;
s/\bMacQuarie/Macquarie/go ;
s/\bMacOmber/Macomber/go ;
s/\bMacIn/Macin/go ;
s/\bMacKintosh/Mackintosh/go ;
s/\bMacKen/Macken/go ;
s/\bMacHen/Machen/go ;
s/\bMacisaac/MacIsaac/go ;
s/\bMacHiel/Machiel/go ;
s/\bMacIol/Maciol/go ;
s/\bMacKell/Mackell/go ;
s/\bMacKlem/Macklem/go ;
s/\bMacKrell/Mackrell/go ;
s/\bMacLin/Maclin/go ;
s/\bMacKey/Mackey/go ;
s/\bMacKley/Mackley/go ;
s/\bMacHell/Machell/go ;
s/\bMacHon/Machon/go ;
}
s/Macmurdo/MacMurdo/go ;
# Fixes for "son (daughter) of" etc. in various languages.
s{ \b Al(?=\s+\w) }{al}gox ; # al Arabic or forename Al.
s{ \b Ap \b }{ap}gox ; # ap Welsh.
s{ \b Ben(?=\s+\w) }{ben}gox ; # ben Hebrew or forename Ben.
s{ \b Dell([ae])\b }{dell$1}gox ; # della and delle Italian.
s{ \b D([aeiu]) \b }{d$1}gox ; # da, de, di Italian; du French.
s{ \b De([lr]) \b }{de$1}gox ; # del Italian; der Dutch/Flemish.
s{ \b El \b }{el}gox unless $SPANISH ; # el Greek or El Spanish.
s{ \b La \b }{la}gox unless $SPANISH ; # la French or La Spanish.
s{ \b L([eo]) \b }{l$1}gox ; # lo Italian; le French.
s{ \b Van(?=\s+\w) }{van}gox ; # van German or forename Van.
s{ \b Von \b }{von}gox ; # von Dutch/Flemish
# Fixes for roman numeral names, e.g. Henry VIII, up to 89, LXXXIX
s{ \b ( (?: [Xx]{1,3} | [Xx][Ll] | [Ll][Xx]{0,3} )?
(?: [Ii]{1,3} | [Ii][VvXx] | [Vv][Ii]{0,3} )? ) \b }{\U$1}gox ;
$_ ;
}
Upvotes: 1
Reputation: 103495
Wimps!.... Here's my second attempt. Handles "John SMITH", "Mary O'SMITH" "John Middlename SMITH", "E.t MCHYPHEN-SMITH" and "JoHn-JOE MacDoNAld"
Regex fixnames = new Regex("(Ma?C)?(\w)(\w*)(\W*)");
string newName = fixnames.Replace(badName, NameFixer);
static public string NameFixer(Match match)
{
string mc = "";
if (match.Groups[1].Captures.Count > 0)
{
if (match.Groups[1].Captures[0].Length == 3)
mc = "Mac";
else
mc = "Mc";
}
return
mc
+match.Groups[2].Captures[0].Value.ToUpper()
+match.Groups[3].Captures[0].Value.ToLower()
+match.Groups[4].Captures[0].Value;
}
NOTE: By the time I realized you wanted a Javascript solution instead of a .NET one, I was having too much funny to stop....
Upvotes: 1
Reputation: 89171
Something like this?
function fix_name(name) {
var replacer = function (whole,prefix,word) {
ret = [];
if (prefix) {
ret.push(prefix.charAt(0).toUpperCase());
ret.push(prefix.substr(1).toLowerCase());
}
ret.push(word.charAt(0).toUpperCase());
ret.push(word.substr(1).toLowerCase());
return ret.join('');
}
var pattern = /\b(ma?c)?([a-z]+)/ig;
return name.replace(pattern, replacer);
}
Upvotes: 2
Reputation: 93
Agreed it will never be perfect, but looking to get the most common cases. Which is pretty much to camel case any "word" and handle hyphens and apostrophe's I guess as spaces.
Upvotes: 0
Reputation: 10645
Unfortunately there are too many different name formats to do this correctly. John-Joe MacDonald is always going to be a nuisance!
Upvotes: 0