bittusarkar
bittusarkar

Reputation: 6357

Performance of IndexOf(char) vs Contains(string) for checking the presence of a character in a string

I want to know if

string.IndexOf(char) 

is faster than

string.Contains(string) 

Intention is to check if a single character is present in a string. I know that as per requirement I should use string.Contains(string) but that is not the point of this question. I did try to disassemble mscorlib.dll in an attempt to compare their implementations but I could not find the implementation of

 string.IndexOf(char)

as it is implemented within the CLR itself. Implementation of

 string.Contains(string)

looks quite heavy though.

Upvotes: 0

Views: 6001

Answers (4)

GCymbala
GCymbala

Reputation: 373

I ran a test similar to Dmitry Bychenko's, but ran each test 1000 times. For each iteration, I generated a new, pseudo-random 20,000,001-character long string to avoid any distortion of results caused by any caching that might be happening. Looking at the .NET source code, it seems like most of these methods are ultimately calling unmanaged code, as Konstantin K. pointed out. And it appears that there can be a big difference in performance depending on whether your string can be an ASCII string or whether it contains Unicode characters. Check out that source.IndexOf("範") in .NET Core 6 below!

I ran the same tests in a .NET Framework 4.8 console application and a .NET Core 6.0 console application. Here are the results:

.NET Framework 4.8

"ASCII" String:
source.IndexOf('b')     Median:    2 ms    Average:   2.462 ms
source.IndexOf("b")     Median:   54 ms    Average:  55.432 ms
source.Contains('b')    Median:   46 ms    Average:  47.869 ms
source.Contains("b")    Median:   12 ms    Average:  12.773 ms

String containing Unicode characters:
source.IndexOf('範')    Median:    2 ms    Average:   2.030 ms
source.IndexOf("範")    Median:  116 ms    Average: 116.324 ms
source.Contains('範')   Median:   43 ms    Average:  43.218 ms
source.Contains("範")   Median:    7 ms    Average:   6.621 ms

.NET Core 6.0

"ASCII" String:
source.IndexOf('b')     Median:    0 ms    Average:  0.305 ms
source.IndexOf("b")     Median:   16 ms    Average: 15.789 ms
source.Contains('b')    Median:    0 ms    Average:  0.099 ms
source.Contains("b")    Median:    0 ms    Average:  0.300 ms

String containing Unicode characters:
source.IndexOf('範')     Median:   0 ms    Average:   0.354 ms
source.IndexOf("範")     Median: 259 ms    Average: 259.086 ms
source.Contains('範')    Median:   0 ms    Average:   0.027 ms
source.Contains("範")    Median:   0 ms    Average:   0.351 ms

BUT ...

How many times are you going to search a 20 million character long string for a single character? And how many times would you find yourself doing that 1000 times in a row? The point being that an end user probably would never notice if something took 2 milliseconds or 40. No need to do micro-optimizations, unless you truly need to. (And avoid String.IndexOf(String) with Chinese characters in .NET Core!)

Upvotes: 0

Konstantin K.
Konstantin K.

Reputation: 178

Source code for mscorlib.dll is available (and has been at the time of the original post) at Microsoft Reference Source. You can see there in String class, that Contains(string) is a wrapper

public bool Contains( string value ) {
    return ( IndexOf(value, StringComparison.Ordinal) >=0 );
}

around IndexOf function, which is using extern InternalFindNLSStringEx function, that must be pretty fast. But not so fast, as extern IndexOf(char) function, for obvious reasons. This explains, why the calculation speed for IndexOf(string) and Contains(string) is nearly the same. But be careful with your becnhmarks, because deep down in CLR it uses some sort of caching, I think, so the speed of one function depends on in which order you call it - before or after another.

This question wasn't present in original post, but as someone above mentioned this, let's also look at Contains<char> which is an Enumerable extension:

public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value, IEqualityComparer<TSource> comparer)
{
    if (comparer == null) comparer = EqualityComparer<TSource>.Default;
    if (source == null) throw Error.ArgumentNull("source");
    foreach (TSource element in source)
        if (comparer.Equals(element, value)) return true;
    return false;
}

So, it is going to call Equals on every element in sequence. This is neat, and allows you to use custom comparer, but using IndexOf(char) would be faster.

Upvotes: 3

Dmitrii Bychenko
Dmitrii Bychenko

Reputation: 186688

Just test and see

  String source = new String('a', 10000000) + "b" + new String('c', 10000000);

  Stopwatch sw = new Stopwatch();

  sw.Start();

  // Just an estimation
  int result = source.IndexOf('b');
  // int result = source.IndexOf("b");
  // Boolean result = source.Contains("b");
  // Boolean result = source.Contains('b');

  sw.Stop();

  int time = sw.ElapsedMilliseconds;

At my workstation (i5 3.2 GHz, .Net 5.0 64 bit) it takes about 10 ms for Char and 38 ms for String

Edit: Performance's outcome is

  IndexOf(Char)     10 -- Fastest
  IndexOf(String)   38 
  Contains(Char)   100 -- Slowest
  Contains(String)  41

so IndexOf(String) and Contains(String) are roughly the same

Upvotes: 9

DrKoch
DrKoch

Reputation: 9772

  1. You should write a small benchmark to find out.
  2. I predict this outcome: since IndexOf(char) is a simple loop and Contains() (and IndexOf(string)) is usually two nested loops, the former will be faster.

Upvotes: 0

Related Questions