Reputation: 1747
I want to make sure that C# string does not contain specific characters.
I am using string.IndexOfAny(char[])
, in my opinion Regex would be slower in this task. Is there a better way to accomplish this? Speed is critical in my application.
Upvotes: 6
Views: 2165
Reputation: 460158
You could use this concise and efficient LINQ query:
HashSet<char> specificChars = new HashSet<char>{ 'a', 'b', 'c'};
bool notContainsSpecificChars = !"test".Any(specificChars.Contains); // true
I have used a HashSet<char>
since it is efficient for lookups, duplicates aren't allowed.
if you have an array as input you could use the constructor to create a HashSet
from it:
char[] chars = new[] { 'a', 'b', 'c', 'c' };
specificChars = new HashSet<char>(chars); // c is removed since it was a duplicate
Another approach without HashSet
is using Enumerable.Intersect
+ Enumerable.Any
:
bool notContainsSpecificChars = !"test".Intersect(chars).Any();
Upvotes: 2
Reputation: 19407
Ran a quick benchmark on IndexOf
vs IndexOfAny
vs Regex
vs Hashset
.
500 word lorem ipsum haystack, with two character needle.
Tested with both needles in haystack, one in haystack, neither in haystack.
private long TestIndexOf(string haystack, char[] needles)
{
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
for (int i = 0; i < 1000000; i++)
{
int x = haystack.IndexOfAny(needles);
}
sw.Stop();
return sw.ElapsedMilliseconds;
}
private long TestRegex(string haystack, char[] needles)
{
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
Regex regex = new Regex(string.Join("|", needles));
for (int i = 0; i < 1000000; i++)
{
Match m = regex.Match(haystack);
}
sw.Stop();
return sw.ElapsedMilliseconds;
}
private long TestIndexOf(string haystack, char[] needles)
{
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
for (int i = 0; i < 1000000; i++)
{
int x = haystack.IndexOf(needles[0]);
}
sw.Stop();
return sw.ElapsedMilliseconds;
}
private long TestHashset(string haystack, char[] needles)
{
HashSet<char> specificChars = new HashSet<char>(needles.ToList());
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
for (int i = 0; i < 1000000; i++)
{
bool notContainsSpecificChars = !haystack.Any(specificChars.Contains);
}
sw.Stop();
return sw.ElapsedMilliseconds;
}
Result for 1,000,000 iterations:
Index of : 28/2718/2711
Index of Any : 153/141/17561
Regex of : 1068/1102/92324
Hashset : 939/891/111702
Notes:
Overall, regex
is slower then indexofany
by upto 10x depending on the haystack and needle sizes.
Upvotes: 6
Reputation: 3895
String.IndexOfAny(char[])
is implemented in the CLR itself and String.IndexOf
uses an extern call, so they're both pretty fast. Either will be much faster than using a regex.
Whether IndexOf
is better than IndexOfAny
depends on how many characters you're expecting to be checking for. Based on some really rough benchmarking, it looks like IndexOf
performs better (by a tiny amount) for 2 or fewer characters, but IndexOfAny
is better for 3 or more. The difference is minuscule, though - the advantage of using IndexOfAny
may be overwhelmed by the cost of allocating the array of characters.
Upvotes: 0
Reputation: 4234
if you must find one char only, it woulb be better call method IndexOf(singleChar) or IndexOf(singleChar, startIndex, charCount) instead.
Of cuorse Regex would be much more computational expensive!
Upvotes: 0