Reputation: 1161
Currently, I'm using a ucwords-related function to make capital letters after hyphens, dots and apostrophes:
function ucwordsMore ($str){
$str = ucwords($str);
$str = str_replace('- ','-',ucwords(str_replace('-','- ',$str))); // hyphens
$str = str_replace('. ','.',ucwords(str_replace('.','. ',$str))); // dots
$str = preg_replace("/\w[\w']*/e", "ucwords('\\0')", $str); // apostrophes
return $str;
}
It works fine to english letters. However, non-english letters are not recognized properly. For instance this text:
La dernière usine française d'accordéons reste à Tulle
is turned into this text:
La DernièRe Usine FrançAise D'accordéOns Reste à Tulle
But I need it to be:
La Dernière Usine Française D'Accordéons Reste À Tulle
Any ideas?
Upvotes: 2
Views: 1913
Reputation: 10269
Use this:
function mb_ucwords ($string)
{
return mb_convert_case ($string, MB_CASE_TITLE, 'UTF-8');
}
Upvotes: 1
Reputation:
As @Jon mentioned, you need to use locale
which implements relationships between upper/lower caseing that affects function calls that use that. Typically it is LC_CTYPE
.
There are constants for numeric behavior, sorting, monetary and others too. Locale needs to be installed on your machine, or be available via plugins or modules, etc. Read up on that.
I don't know php locale at all so here is a sample in Perl that uses a regex approach different than yours. I couldn't figure out your solution so well, hopefully you can get some ideas from mine.
use locale;
use POSIX qw(locale_h);
setlocale(LC_CTYPE, "en_US");
$str = "La dernière usine française d'accordéons reste à Tulle";
$str =~ s/ (?:^|(?<=\s)|(?<=\w-)|(?<=\w\.)|(?<=\w\')) (\w) / uc($1) /xeg;
print "$str\n";
Output
La Dernière Usine Française D'Accordéons Reste À Tulle
Regex
Form is s/// find and replace
s/ # Search
(?: # Group
^ # beginning of string
| (?<=\s) # or, lookbehind \s
| (?<=\w-) # or, lookbehind \w-
| (?<=\w\.) # or, lookbehind \w\.
| (?<=\w\') # or, lookbehind \w\'
) # End group
(\w) # Capture group 1, a single word char
/ # Replace
uc($1) # Upercased word char from capt grp 1
/xeg; # Modifiers x(expanded), e(eval), g(global)
Upvotes: 2
Reputation: 17705
Have a look at Kohana UTF8 class - http://kohanaframework.org/3.2/guide/api/UTF8
Upvotes: 0