Reputation: 5970
If I have a string of UTF-8 characters and they need to be output to an older system as UTF-7 I have two questions pertaining to this.
How can I convert a string s which has UTF-8 characters to the same string without those characters efficiently?
Are there any simple of converting extended characters like 'Ō' to their closest non extended equivalent 'O'?
Upvotes: 3
Views: 5256
Reputation: 1502276
If the older system can actually handle UTF-7 properly, why do you want to remove anything? Just encode the string as UTF-7:
string text = LoadFromWherever(Encoding.UTF8);
byte[] utf7 = Encoding.UTF7.GetBytes(text);
Then send the UTF-7-encoded text down to the older system.
If you've got the original UTF-8-encoded bytes, you can do this in one step:
byte[] utf7 = Encoding.Convert(Encoding.UTF8, Encoding.UTF7, utf8);
If you actually need to convert to ASCII, you can do this reasonably easily.
To remove the non-ASCII characters:
var encoding = Encoding.GetEncoding
("us-ascii", new EncoderReplacementFallback(""),
new DecoderReplacementFallback(""));
byte[] ascii = encoding.GetBytes(text);
To convert non-ASCII to nearest equivalent:
string normalized = text.Normalize(NormalizationForm.FormKD);
var encoding = Encoding.GetEncoding
("us-ascii", new EncoderReplacementFallback(""),
new DecoderReplacementFallback(""));
byte[] ascii = encoding.GetBytes(normalized);
Upvotes: 6