Reputation: 31778
How do I strip non alphanumeric characters from a string and loose spaces in C# with Replace?
I want to keep a-z, A-Z, 0-9 and nothing more (not even " " spaces).
"Hello there(hello#)".Replace(regex-i-want, "");
should give
"Hellotherehello"
I have tried "Hello there(hello#)".Replace(@"[^A-Za-z0-9 ]", "");
but the spaces remain.
Upvotes: 49
Views: 49031
Reputation: 186823
You can use Linq to filter out required characters:
String source = "Hello there(hello#)";
// "Hellotherehello"
String result = new String(source
.Where(ch => Char.IsLetterOrDigit(ch))
.ToArray());
Or
String result = String.Concat(source
.Where(ch => Char.IsLetterOrDigit(ch)));
And so you have no need in regular expressions.
Upvotes: 23
Reputation: 5989
Use following regex to strip those all characters from the string using Regex.Replace
([^A-Za-z0-9\s])
Upvotes: -2
Reputation: 788
And as a replace operation as an extension method:
public static class StringExtensions
{
public static string ReplaceNonAlphanumeric(this string text, char replaceChar)
{
StringBuilder result = new StringBuilder(text.Length);
foreach(char c in text)
{
if(c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
result.Append(c);
else
result.Append(replaceChar);
}
return result.ToString();
}
}
And test:
[TestFixture]
public sealed class StringExtensionsTests
{
[Test]
public void Test()
{
Assert.AreEqual("text_LaLa__lol________123______", "text LaLa (lol) á ñ $ 123 ٠١٢٣٤".ReplaceNonAlphanumeric('_'));
}
}
Upvotes: 2
Reputation: 11
var text = "Hello there(hello#)";
var rgx = new Regex("[^a-zA-Z0-9]");
text = rgx.Replace(text, string.Empty);
Upvotes: 1
Reputation: 260
Or you can do this too:
public static string RemoveNonAlphanumeric(string text)
{
StringBuilder sb = new StringBuilder(text.Length);
for (int i = 0; i < text.Length; i++)
{
char c = text[i];
if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
sb.Append(text[i]);
}
return sb.ToString();
}
Usage:
string text = SomeClass.RemoveNonAlphanumeric("text LaLa (lol) á ñ $ 123 ٠١٢٣٤");
//text: textLaLalol123
Upvotes: 3
Reputation: 31778
The mistake made above was using Replace incorrectly (it doesn't take regex, thanks CodeInChaos).
The following code should do what was specified:
Regex reg = new Regex(@"[^\p{L}\p{N}]+");//Thanks to Tim Pietzcker for regex
string regexed = reg.Replace("Hello there(hello#)", "");
This gives:
regexed = "Hellotherehello"
Upvotes: 2
Reputation: 336448
In your regex, you have excluded the spaces from being matched (and you haven't used Regex.Replace()
which I had overlooked completely...):
result = Regex.Replace("Hello there(hello#)", @"[^A-Za-z0-9]+", "");
should work. The +
makes the regex a bit more efficient by matching more than one consecutive non-alphanumeric character at once instead of one by one.
If you want to keep non-ASCII letters/digits, too, use the following regex:
@"[^\p{L}\p{N}]+"
which leaves
BonjourmesélèvesGutenMorgenliebeSchüler
instead of
BonjourmeslvesGutenMorgenliebeSchler
Upvotes: 70
Reputation: 242
In .Net 4.0 you can use the IsNullOrWhitespace method of the String class to remove the so called white space characters. Please take a look here http://msdn.microsoft.com/en-us/library/system.string.isnullorwhitespace.aspx However as @CodeInChaos pointed there are plenty of characters which could be considered as letters and numbers. You can use a regular expression if you only want to find A-Za-z0-9.
Upvotes: -6