watain
watain

Reputation: 5098

Bash: Convert non-ASCII characters to ASCII

How can I convert a string like Žvaigždės aukštybėj užges or äüöÖÜÄ to Zvaigzdes aukstybej uzges or auoOUA, respectively, using Bash?

Basically I just want to convert all characters which aren't in the Latin alphabet.

Thanks

Upvotes: 40

Views: 53458

Answers (5)

GLNB
GLNB

Reputation: 109

You can also use the python library unidecode to perform so:

$ echo "Žvaigždės aukštybėj užges äüöÖÜÄ" | unidecode

Output:

Zvaigzdes aukstybej uzges auoOUA

See this post for other approaches.

Upvotes: 4

Steve De Caux
Steve De Caux

Reputation: 1779

You might be able to use iconv.

For example, the string:

Žvaigždės aukštybėj užges or äüöÖÜÄ

is in file testutf8.txt, utf8 format.

Running command:

iconv -f UTF8 -t US-ASCII//TRANSLIT testutf8.txt

results in:

Zvaigzdes aukstybej uzges or auoOUA

Upvotes: 18

user1244254
user1244254

Reputation:

 try {
        String name = "Žvaigždės aukštybėj užges ";
        String s1 = Normalizer.normalize(name, Normalizer.Form.NFKD);
        String regex = "[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+";

        String s2 = new String(s1.replaceAll(regex, "").getBytes("ascii"), "ascii");

    } catch (UnsupportedEncodingException e) {
    }

Upvotes: 0

Emil Vikström
Emil Vikström

Reputation: 91902

echo Hej på dig, du den dära | iconv -f utf-8 -t us-ascii//TRANSLIT

gives:

Hej pa dig, du den dara

Upvotes: 8

Michael Krelin - hacker
Michael Krelin - hacker

Reputation: 143061

Depending on your machine you can try piping your strings through

iconv -f utf-8 -t ascii//translit

(or whatever your encoding is, if it's not utf-8)

Upvotes: 70

Related Questions