Reputation: 168
My perl script is provided with a string of characters in UTF-8 which could be in any language. I need to capitalize the first character of each word, and the remaining characters of the word converted to lower case. This must be done while leaving the text in UTF-8 format.
The following seems to work well enough when the text only contains latin characters
$my_string =~ s/([\w']+)/\u\L$1/g;
How can I get this to work in a UTF-8 string?
Upvotes: 2
Views: 803
Reputation: 189457
See perlunicode for an overview of the facilities you need to be familiar with. Basically, you are looking for something like \p{LC}
.
Your problem space is not well-defined, though; not all scripts have a concept of character case. The LC property will only match on scripts which do, so it should get you there.
Upvotes: 2