Darryl Hein
Darryl Hein

Reputation: 145027

Does someone have a PHP function to properly capitalize people names?

I'm looking for a function to properly capitalize names like McDonald, FitzGerald, MacArthur, O'Lunney's, Theo de Raadt, etc.

Does anyone know of one that works resonably well? I'm guessing any function is not going to support every possibility.

Of course ucwords alone doesn't work for this because it just capitalize the first letter of every word.

Edit: I know there are going to be problems and all the possibility are not going to be supported. But the issue right now is I have a database of around 50 000 names that are mostly entered in all caps and it would be a pain in the ass to have to edit each one without causing spelling errors. Having a script that causes a problem with 20% would be a whole lot faster and result in a lot fewer errors.

Upvotes: 11

Views: 3058

Answers (4)

Tim Visée
Tim Visée

Reputation: 3248

I came up with this:

/**
  * Normalize the given (partial) name of a person.
  *
  * - re-capitalize, take last name inserts into account
  * - remove excess white spaces
  *
  * Snippet from: https://timvisee.com/blog/snippet-correctly-capitalize-names-in-php
  *
  * @param string $name The input name.
  * @return string The normalized name.
  */
function name_case($name) {
    // A list of properly cased parts
    $CASED = [
      "O'", "l'", "d'", 'St.', 'Mc', 'the', 'van', 'het', 'in', "'t", 'ten',
      'den', 'von', 'und', 'der', 'de', 'da', 'of', 'and', 'the', 'III', 'IV',
      'VI', 'VII', 'VIII', 'IX',
    ];

    // Trim whitespace sequences to one space, append space to properly chunk
    $name = preg_replace('/\s+/', ' ', $name) . ' ';

    // Break name up into parts split by name separators
    $parts = preg_split('/( |-|O\'|l\'|d\'|St\\.|Mc)/i', $name, -1, PREG_SPLIT_DELIM_CAPTURE);

    // Chunk parts, use $CASED or uppercase first, remove unfinished chunks
    $parts = array_chunk($parts, 2);
    $parts = array_filter($parts, function($part) {
            return sizeof($part) == 2;
        });
    $parts = array_map(function($part) use($CASED) {
            // Extract to name and separator part
            list($name, $separator) = $name;

            // Use specified case for separator if set
            $cased = current(array_filter($CASED, function($i) use($separator) {
                return strcasecmp($i, $separator) == 0;
            }));
            $separator = $cased ? $cased : $separator;

            // Choose specified part case, or uppercase first as default
            $cased = current(array_filter($CASED, function($i) use($name) {
                return strcasecmp($i, $name) == 0;
            }));
            return [$cased ? $cased : ucfirst(strtolower($name)), $separator];
        }, $parts);
    $parts = array_map(function($part) {
            return implode($part);
        }, $parts);
    $name = implode($parts);

    // Trim and return normalized name
    return trim($name);
}

It uses a list of parts for which is assumed the casing is correct. It will never be perfect, but it might improve things for your implementation.

Upvotes: 1

user894932
user894932

Reputation:

Generally I use

$output = trim(implode('-', array_map('ucfirst', explode('-', ucwords(strtolower(str_replace('_',' ',$input)))))));

handy if you are storing _ instead of spaces in your DB, or using them in urls, handles hyphenated names well too.

Also saw this somewhere that seems to do a good job in most cases

   /**
     * @param $string
     * @return string
     */
    public function titleCase($string)
    {

        $word_splitters = array(' ', '-', "O'", "L'", "D'", 'St.', 'Mc', 'Mac');
        $lowercase_exceptions = array('the', 'van', 'den', 'von', 'und', 'der', 'de', 'di', 'da', 'of', 'and', "l'", "d'");
        $uppercase_exceptions = array('III', 'IV', 'VI', 'VII', 'VIII', 'IX');

        $string = strtolower($string);
        foreach ($word_splitters as $delimiter) {
            $words = explode($delimiter, $string);
            $newwords = array();
            foreach ($words as $word) {
                if (in_array(strtoupper($word), $uppercase_exceptions))
                    $word = strtoupper($word);
                else
                    if (!in_array($word, $lowercase_exceptions))
                        $word = ucfirst($word);

                $newwords[] = $word;
            }

            if (in_array(strtolower($delimiter), $lowercase_exceptions))
                $delimiter = strtolower($delimiter);

            $string = join($delimiter, $newwords);
        }
        return $string;
    }

names like Jurgen Macho (a footballer) is returned as Jurgen MacHo though, as pointed out in other answers and comments, names are hard.

Upvotes: 2

inakiabt
inakiabt

Reputation: 1963

Maybe you need something like this ucwords function note

Upvotes: 6

John Carter
John Carter

Reputation: 55271

You're probably aware of this, but one huge problem you'll face is that there's more than one "correct" capitalisation of some names - in your example I'd disagree with FitzGerald, for example.

Upvotes: 3

Related Questions