Reputation: 145027
I'm looking for a function to properly capitalize names like McDonald, FitzGerald, MacArthur, O'Lunney's, Theo de Raadt, etc.
Does anyone know of one that works resonably well? I'm guessing any function is not going to support every possibility.
Of course ucwords alone doesn't work for this because it just capitalize the first letter of every word.
Edit: I know there are going to be problems and all the possibility are not going to be supported. But the issue right now is I have a database of around 50 000 names that are mostly entered in all caps and it would be a pain in the ass to have to edit each one without causing spelling errors. Having a script that causes a problem with 20% would be a whole lot faster and result in a lot fewer errors.
Upvotes: 11
Views: 3058
Reputation: 3248
I came up with this:
/**
* Normalize the given (partial) name of a person.
*
* - re-capitalize, take last name inserts into account
* - remove excess white spaces
*
* Snippet from: https://timvisee.com/blog/snippet-correctly-capitalize-names-in-php
*
* @param string $name The input name.
* @return string The normalized name.
*/
function name_case($name) {
// A list of properly cased parts
$CASED = [
"O'", "l'", "d'", 'St.', 'Mc', 'the', 'van', 'het', 'in', "'t", 'ten',
'den', 'von', 'und', 'der', 'de', 'da', 'of', 'and', 'the', 'III', 'IV',
'VI', 'VII', 'VIII', 'IX',
];
// Trim whitespace sequences to one space, append space to properly chunk
$name = preg_replace('/\s+/', ' ', $name) . ' ';
// Break name up into parts split by name separators
$parts = preg_split('/( |-|O\'|l\'|d\'|St\\.|Mc)/i', $name, -1, PREG_SPLIT_DELIM_CAPTURE);
// Chunk parts, use $CASED or uppercase first, remove unfinished chunks
$parts = array_chunk($parts, 2);
$parts = array_filter($parts, function($part) {
return sizeof($part) == 2;
});
$parts = array_map(function($part) use($CASED) {
// Extract to name and separator part
list($name, $separator) = $name;
// Use specified case for separator if set
$cased = current(array_filter($CASED, function($i) use($separator) {
return strcasecmp($i, $separator) == 0;
}));
$separator = $cased ? $cased : $separator;
// Choose specified part case, or uppercase first as default
$cased = current(array_filter($CASED, function($i) use($name) {
return strcasecmp($i, $name) == 0;
}));
return [$cased ? $cased : ucfirst(strtolower($name)), $separator];
}, $parts);
$parts = array_map(function($part) {
return implode($part);
}, $parts);
$name = implode($parts);
// Trim and return normalized name
return trim($name);
}
It uses a list of parts for which is assumed the casing is correct. It will never be perfect, but it might improve things for your implementation.
Upvotes: 1
Reputation:
Generally I use
$output = trim(implode('-', array_map('ucfirst', explode('-', ucwords(strtolower(str_replace('_',' ',$input)))))));
handy if you are storing _
instead of spaces in your DB, or using them in urls, handles hyphenated names well too.
Also saw this somewhere that seems to do a good job in most cases
/**
* @param $string
* @return string
*/
public function titleCase($string)
{
$word_splitters = array(' ', '-', "O'", "L'", "D'", 'St.', 'Mc', 'Mac');
$lowercase_exceptions = array('the', 'van', 'den', 'von', 'und', 'der', 'de', 'di', 'da', 'of', 'and', "l'", "d'");
$uppercase_exceptions = array('III', 'IV', 'VI', 'VII', 'VIII', 'IX');
$string = strtolower($string);
foreach ($word_splitters as $delimiter) {
$words = explode($delimiter, $string);
$newwords = array();
foreach ($words as $word) {
if (in_array(strtoupper($word), $uppercase_exceptions))
$word = strtoupper($word);
else
if (!in_array($word, $lowercase_exceptions))
$word = ucfirst($word);
$newwords[] = $word;
}
if (in_array(strtolower($delimiter), $lowercase_exceptions))
$delimiter = strtolower($delimiter);
$string = join($delimiter, $newwords);
}
return $string;
}
names like Jurgen Macho (a footballer) is returned as Jurgen MacHo though, as pointed out in other answers and comments, names are hard.
Upvotes: 2
Reputation: 55271
You're probably aware of this, but one huge problem you'll face is that there's more than one "correct" capitalisation of some names - in your example I'd disagree with FitzGerald, for example.
Upvotes: 3