JoJo
JoJo

Reputation: 87

uniq, how to be accent insensitive?

How to make the shell "uniq" command accent insensitive?

# more test
a
à
b


# LC_ALL=fr_FR.UTF-8  uniq test
a
à
b

Expected:

# LC_ALL=fr_FR.UTF-8  uniq test
a
b

Note: following is not OK, as it would change input data:

 cat test | sed "s/à/a/" | uniq

Upvotes: 1

Views: 168

Answers (1)

Shawn
Shawn

Reputation: 52549

This works for your simple example:

$ cat letters.txt
a
à
b
$ paste <(iconv -f utf8 -t ascii//translit letters.txt) letters.txt | sort -s -k1,1 -u | cut -f2
a
b

It requires the GNU version of iconv to support transliteration to the output encoding, and a shell like bash or zsh that supports <(command) redirection.

Upvotes: 2

Related Questions