Hanny
Hanny

Reputation: 165

How to convert from unicode to ASCII

Is there any way to convert unicode values to ASCII?

Upvotes: 11

Views: 77212

Answers (7)

georgiosd
georgiosd

Reputation: 3059

It depends what you mean by "convert".

You can transliterate using the AnyAscii package.

// C#
using AnyAscii;

string s = "άνθρωποι".Transliterate();
// anthropoi

Upvotes: 5

Svonstruen
Svonstruen

Reputation: 1

If your metadata fields only accept ASCII input. Unicode characters can be converted to their TEX equivalent through MathJax. What is MathJax? MathJax is a JavaScript display engine for rendering TEX or MathML-coded mathematics in browsers without requiring font installation or browser plug-ins. Any modern browser with JavaScript enabled will be MathJax-ready. For general information about MathJax, visit mathjax.org.

Upvotes: 0

Rednael
Rednael

Reputation: 370

This workaround might better suit your needs. It strips the unicode chars from a string and only keeps the ASCII chars.

byte[] bytes = Encoding.ASCII.GetBytes("eéêëèiïaâäàåcç  test");
char[] chars = Encoding.ASCII.GetChars(bytes);
string line = new String(chars);
line = line.Replace("?", "");
//Results in "eiac test"

Please note that the 2nd "space" in the character input string is the char with ASCII value 255

Upvotes: 3

79E09796
79E09796

Reputation: 2230

To simply strip the accents from unicode characters you can use something like:

string.Concat(input.Normalize(NormalizationForm.FormD).Where(
  c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));

Upvotes: 27

Kyle Rosendo
Kyle Rosendo

Reputation: 25287

Technically, yes you can by using Encoding.ASCII.

Example (from byte[] to ASCII):

// Convert Unicode to Bytes

byte[] uni = Encoding.Unicode.GetBytes("Whatever unicode string you have");

// Convert to ASCII

string Ascii = Encoding.ASCII.GetString(uni);

Just remember Unicode a much larger standard than Ascii and there will be characters that simply cannot be correctly encoded. Have a look here for tables and a little more information on the two encodings.

Upvotes: 5

Kilian Foth
Kilian Foth

Reputation: 14386

You CAN'T convert from Unicode to ASCII. Almost every character in Unicode cannot be expressed in ASCII, and those that can be expressed have exactly the same codepoints in ASCII as in UTF-8, which is probably what you have. Almost the only thing you can do that is even close to the right thing is to discard all characters above codepoint 128, and even that is very likely nowhere near what your requirements say. (The other possibility is to simplify accented or umlauted letters to make more than 128 characters 'nearly' expressible, but that still doesn't even begin to actually cover Unicode.)

Upvotes: 3

Dean Harding
Dean Harding

Reputation: 72668

Well, seeing as how there's some 100,000+ unicode characters and only 128 ASCII characters, a 1-1 mapping is obviously impossible.

You can use the Encoding.ASCII object to get the ASCII byte values from a Unicode string, though.

Upvotes: 2

Related Questions