raccettura
raccettura

Reputation: 93

JS Regex For Human Names

I'm looking for a good JavaScript RegEx to convert names to proper cases. For example:

John SMITH = John Smith

Mary O'SMITH = Mary O'Smith

E.t MCHYPHEN-SMITH = E.T McHyphen-Smith  

John Middlename SMITH = John Middlename SMITH

Well you get the idea.

Anyone come up with a comprehensive solution?

Upvotes: 4

Views: 2618

Answers (5)

Robert Krimen
Robert Krimen

Reputation: 289

Mark Summerfield has done a comprehensive job of this with Lingua::EN::NameCase:

KEITH               Keith
LEIGH-WILLIAMS      Leigh-Williams
MCCARTHY            McCarthy
O'CALLAGHAN         O'Callaghan
ST. JOHN            St. John
VON STREIT          von Streit
VAN DYKE            van Dyke
AP LLWYD DAFYDD     ap Llwyd Dafydd
henry viii          Henry VIII
louis xiv           Louis XIV

The above is written in Perl, but it makes heavy use of regular expressions, so you should be able to glean some good techniques.

Here the relevant source:

sub nc {

    croak "Usage: nc [[\\]\$SCALAR]"
        if scalar @_ > 1 or ( ref $_[0] and ref $_[0] ne 'SCALAR' ) ;

    local( $_ ) = @_ if @_ ;
    $_ = ${$_} if ref( $_ ) ;           # Replace reference with value.

    $_ = lc ;                           # Lowercase the lot.
    s{ \b (\w)   }{\u$1}gox ;           # Uppercase first letter of every word.
    s{ (\'\w) \b }{\L$1}gox ;           # Lowercase 's.

    # Name case Mcs and Macs - taken straight from NameParse.pm incl. comments.
    # Exclude names with 1-2 letters after prefix like Mack, Macky, Mace
    # Exclude names ending in a,c,i,o, or j are typically Polish or Italian

    if ( /\bMac[A-Za-z]{2,}[^aciozj]\b/o or /\bMc/o ) {
        s/\b(Ma?c)([A-Za-z]+)/$1\u$2/go ;

        # Now correct for "Mac" exceptions
        s/\bMacEvicius/Macevicius/go ;  # Lithuanian
        s/\bMacHado/Machado/go ;        # Portuguese
        s/\bMacHar/Machar/go ;
        s/\bMacHin/Machin/go ;
        s/\bMacHlin/Machlin/go ;
        s/\bMacIas/Macias/go ;  
        s/\bMacIulis/Maciulis/go ;  
        s/\bMacKie/Mackie/go ;
        s/\bMacKle/Mackle/go ;
        s/\bMacKlin/Macklin/go ;
        s/\bMacQuarie/Macquarie/go ;
    s/\bMacOmber/Macomber/go ;
    s/\bMacIn/Macin/go ;
    s/\bMacKintosh/Mackintosh/go ;
    s/\bMacKen/Macken/go ;
    s/\bMacHen/Machen/go ;
    s/\bMacisaac/MacIsaac/go ;
    s/\bMacHiel/Machiel/go ;
    s/\bMacIol/Maciol/go ;
    s/\bMacKell/Mackell/go ;
    s/\bMacKlem/Macklem/go ;
    s/\bMacKrell/Mackrell/go ;
    s/\bMacLin/Maclin/go ;
    s/\bMacKey/Mackey/go ;
    s/\bMacKley/Mackley/go ;
    s/\bMacHell/Machell/go ;
    s/\bMacHon/Machon/go ;
    }
    s/Macmurdo/MacMurdo/go ;

    # Fixes for "son (daughter) of" etc. in various languages.
    s{ \b Al(?=\s+\w)  }{al}gox ;   # al Arabic or forename Al.
    s{ \b Ap        \b }{ap}gox ;       # ap Welsh.
    s{ \b Ben(?=\s+\w) }{ben}gox ;  # ben Hebrew or forename Ben.
    s{ \b Dell([ae])\b }{dell$1}gox ;   # della and delle Italian.
    s{ \b D([aeiu]) \b }{d$1}gox ;      # da, de, di Italian; du French.
    s{ \b De([lr])  \b }{de$1}gox ;     # del Italian; der Dutch/Flemish.
    s{ \b El        \b }{el}gox unless $SPANISH ;   # el Greek or El Spanish.
    s{ \b La        \b }{la}gox unless $SPANISH ;   # la French or La Spanish.
    s{ \b L([eo])   \b }{l$1}gox ;      # lo Italian; le French.
    s{ \b Van(?=\s+\w) }{van}gox ;  # van German or forename Van.
    s{ \b Von       \b }{von}gox ;  # von Dutch/Flemish

    # Fixes for roman numeral names, e.g. Henry VIII, up to 89, LXXXIX
    s{ \b ( (?: [Xx]{1,3} | [Xx][Ll]   | [Ll][Xx]{0,3} )?
            (?: [Ii]{1,3} | [Ii][VvXx] | [Vv][Ii]{0,3} )? ) \b }{\U$1}gox ;

    $_ ;
}

Upvotes: 1

James Curran
James Curran

Reputation: 103495

Wimps!.... Here's my second attempt. Handles "John SMITH", "Mary O'SMITH" "John Middlename SMITH", "E.t MCHYPHEN-SMITH" and "JoHn-JOE MacDoNAld"

Regex fixnames = new Regex("(Ma?C)?(\w)(\w*)(\W*)");
string newName = fixnames.Replace(badName, NameFixer);


static public string NameFixer(Match match) 
{
    string mc = "";
    if (match.Groups[1].Captures.Count > 0)
    {
        if (match.Groups[1].Captures[0].Length == 3)
            mc = "Mac";
        else
            mc = "Mc";
    }

    return 
       mc
      +match.Groups[2].Captures[0].Value.ToUpper()
      +match.Groups[3].Captures[0].Value.ToLower()
      +match.Groups[4].Captures[0].Value;
}

NOTE: By the time I realized you wanted a Javascript solution instead of a .NET one, I was having too much funny to stop....

Upvotes: 1

Markus Jarderot
Markus Jarderot

Reputation: 89171

Something like this?

function fix_name(name) {
    var replacer = function (whole,prefix,word) {
        ret = [];
        if (prefix) {
            ret.push(prefix.charAt(0).toUpperCase());
            ret.push(prefix.substr(1).toLowerCase());
        }
        ret.push(word.charAt(0).toUpperCase());
        ret.push(word.substr(1).toLowerCase());
        return ret.join('');
    }
    var pattern = /\b(ma?c)?([a-z]+)/ig;
    return name.replace(pattern, replacer);
}

Upvotes: 2

raccettura
raccettura

Reputation: 93

Agreed it will never be perfect, but looking to get the most common cases. Which is pretty much to camel case any "word" and handle hyphens and apostrophe's I guess as spaces.

Upvotes: 0

harriyott
harriyott

Reputation: 10645

Unfortunately there are too many different name formats to do this correctly. John-Joe MacDonald is always going to be a nuisance!

Upvotes: 0

Related Questions