Reputation: 5098
How can I convert a string like Žvaigždės aukštybėj užges
or äüöÖÜÄ
to Zvaigzdes aukstybej uzges
or auoOUA
, respectively, using Bash?
Basically I just want to convert all characters which aren't in the Latin alphabet.
Thanks
Upvotes: 40
Views: 53458
Reputation: 109
You can also use the python library unidecode to perform so:
$ echo "Žvaigždės aukštybėj užges äüöÖÜÄ" | unidecode
Output:
Zvaigzdes aukstybej uzges auoOUA
See this post for other approaches.
Upvotes: 4
Reputation: 1779
You might be able to use iconv.
For example, the string:
Žvaigždės aukštybėj užges or äüöÖÜÄ
is in file testutf8.txt, utf8 format.
Running command:
iconv -f UTF8 -t US-ASCII//TRANSLIT testutf8.txt
results in:
Zvaigzdes aukstybej uzges or auoOUA
Upvotes: 18
Reputation:
try {
String name = "Žvaigždės aukštybėj užges ";
String s1 = Normalizer.normalize(name, Normalizer.Form.NFKD);
String regex = "[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+";
String s2 = new String(s1.replaceAll(regex, "").getBytes("ascii"), "ascii");
} catch (UnsupportedEncodingException e) {
}
Upvotes: 0
Reputation: 91902
echo Hej på dig, du den dära | iconv -f utf-8 -t us-ascii//TRANSLIT
gives:
Hej pa dig, du den dara
Upvotes: 8
Reputation: 143061
Depending on your machine you can try piping your strings through
iconv -f utf-8 -t ascii//translit
(or whatever your encoding is, if it's not utf-8)
Upvotes: 70