Alexander Farber
Alexander Farber

Reputation: 22988

Matching a border of a russian word with \b

Is this a bug or am I doing something wrong (when trying to match Russian swear words in a multiplayer game chat log) on CentOS 6.5 with the stock perl 5.10.1?

# echo блядь | perl -ne 'print if /\bбля/'

# echo блядь | perl -ne 'print if /бля/'
блядь

# echo $LANG
en_US.UTF-8

Why doesn't the first command print anything?

Upvotes: 3

Views: 131

Answers (1)

choroba
choroba

Reputation: 241908

You have to tell Perl that the source code contains UTF-8 (use utf8), and that the input (-CI) and output (-CO) are UTF-8 encoded:

echo 'помёт' | perl -CIO -ne 'use utf8; print if /\bпомё/'

Upvotes: 4

Related Questions