Reputation: 405
I have next problem;
$ echo ača | tr 'č' 'c'
$ acca
Why it gives mi double "c" ? How to solve that? I want aca
, not acca
.
Upvotes: 1
Views: 253
Reputation: 47119
č
is two bytes long in unicode:
charinfo č
U+010D LATIN SMALL LETTER C HACEK [Ll]
tr
will see it as two characters of one byte each. Then it will extend the second argument until all characters have been replaced, therefore two c's.
You could use sed (might just be GNU):
echo ača | sed 'y/č/c/'
Or Perl:
echo ača | perl -pe 'use open qw(:std :utf8);use utf8;y/č/c/'
Consider this which might make you understand what's happening:
% echo abc | tr 'abc' 'de'
dee
Upvotes: 4