Reputation: 13735
I'd like to match CJK characters. But the following regex [[:alpha:]]\+
does not work. Does anybody know to match CJK characters?
$ echo '程 a b' | sed -e 's/\([[:alpha:]]\+\)/x\1/g'
程 xa xb
The desired the output is x程 a b
.
Upvotes: 1
Views: 6140
Reputation: 627082
With Perl, your solution will look like
perl -CSD -Mutf8 -pe 's/\p{Han}+/x$&/g' filename
Or, with older Perl versions before 5.20, use a capturing group:
perl -CSD -Mutf8 -pe 's/(\p{Han}+)/x$1/g' filename
To modify file contents inline add -i
option:
perl -i -CSD -Mutf8 -pe 's/(\p{Han}+)/x$1/g' filename
NOTES
\p{Han}
matches a single Chinese character, \{Han}+
matches chunks of 1 or more Chinese characters$1
is the backreference to the value captured with (\p{Han}+)
, $&
replaces with the whole match value-Mutf8
lets Perl recognize the UTF8-encoded characters used directly in your Perl code-CSD
(equivalent to -CIOED
) allows input decoding and output re-encoding (it will work for UTF8 encoding).Upvotes: 0