word boundaries of UTF-8 text in perl

Question

My perl script is provided with a string of characters in UTF-8 which could be in any language. I need to capitalize the first character of each word, and the remaining characters of the word converted to lower case. This must be done while leaving the text in UTF-8 format.

The following seems to work well enough when the text only contains latin characters

$my_string =~ s/([\w']+)/\u\L$1/g;

How can I get this to work in a UTF-8 string?

tripleee · Accepted Answer

See perlunicode for an overview of the facilities you need to be familiar with. Basically, you are looking for something like \p{LC}.

Your problem space is not well-defined, though; not all scripts have a concept of character case. The LC property will only match on scripts which do, so it should get you there.

word boundaries of UTF-8 text in perl

Answers (1)

Related Questions