Reputation: 87
How to make the shell "uniq" command accent insensitive?
# more test
a
à
b
# LC_ALL=fr_FR.UTF-8 uniq test
a
à
b
Expected:
# LC_ALL=fr_FR.UTF-8 uniq test
a
b
Note: following is not OK, as it would change input data:
cat test | sed "s/à/a/" | uniq
Upvotes: 1
Views: 168
Reputation: 52549
This works for your simple example:
$ cat letters.txt
a
à
b
$ paste <(iconv -f utf8 -t ascii//translit letters.txt) letters.txt | sort -s -k1,1 -u | cut -f2
a
b
It requires the GNU version of iconv
to support transliteration to the output encoding, and a shell like bash
or zsh
that supports <(command)
redirection.
Upvotes: 2