Reputation:
I want to filter some string which has some wrong letters (non-ASCII). It looks different in Notepad, Visual Studio 2010 and MySQL.
How can I check if a string has non-ASCII letters and how I can remove them?
Upvotes: 1
Views: 7271
Reputation: 276
string testString = Regex.Replace(OldString, @"[\u0000-\u0008\u000A-\u001F\u0100-\uFFFF]", "");
Upvotes: 1
Reputation: 2382
This has been a God-send:
Regex.Replace(input, @"[^\u0000-\u007F]", "");
I think I got it elsewhere originally, but here is a link to the same answer here:
How can you strip non-ASCII characters from a string? (in C#)
Upvotes: 1
Reputation: 11910
First, you need to determine what you mean by a "word". If non-ascii, this probably implies non-english?
Personally, I'd ask why you need to do this and what fundamental assumption has your application got that conflicts with your data? Depending on the situation, I suggest you either re-encode the text from the source encoding, although this will be a lossy conversion, or alternatively, address that fundamental assumption so that your application handles data correctly.
Upvotes: 0
Reputation: 128327
I think something as simple as this would probably work, wouldn't it?
public static string AsciiOnly(this string input, bool includeExtendedAscii)
{
int upperLimit = includeExtendedAscii ? 255 : 127;
char[] asciiChars = input.Where(c => (int)c <= upperLimit).ToArray();
return new string(asciiChars);
}
Example usage:
string input = "AB£ȼCD";
string asciiOnly = input.AsciiOnly(false); // returns "ABCD"
string extendedAsciiOnly = input.AsciiOnly(true); // returns "AB£CD"
Upvotes: -1
Reputation: 1038930
You could use a regular expression to filter non ASCII characters:
string input = "AB £ CD";
string result = Regex.Replace(input, "[^\x0d\x0a\x20-\x7e\t]", "");
Upvotes: 4
Reputation: 39695
You could use Regular Expressions.
Regex.Replace(input, "[^a-zA-Z0-9]+", "")
You could also use \W+
as the pattern to remove any non-character.
Upvotes: 1