Reputation: 17568

What is the fastest, case insensitive, way to see if a string contains another string in C#?

EDIT 2:

Confirmed that my performance problems were due to the static function call to the StringExtensions class. Once removed, the IndexOf method is indeed the fastest way of accomplishing this.

What is the fastest, case insensitive, way to see if a string contains another string in C#? I see the accepted solution for the post here at Case insensitive 'Contains(string)' but I have done some preliminary benchmarking and it seems that using that method results in orders of magnitude slower calls on larger strings (> 100 characters) whenever the test string cannot be found.

Here are the methods I know of:

IndexOf:

public static bool Contains(this string source, string toCheck, StringComparison comp)
{
    if (string.IsNullOrEmpty(toCheck) || string.IsNullOrEmpty(source))
        return false;

    return source.IndexOf(toCheck, comp) >= 0;
}

ToUpper:

source.ToUpper().Contains(toCheck.ToUpper());

Regex:

bool contains = Regex.Match("StRiNG to search", "string", RegexOptions.IgnoreCase).Success;

So my question is, which really is the fastest way on average and why so?

EDIT:

Here is my simple test app I used to highlight the performance difference. Using this, I see 16 ms for ToLower(), 18 ms for ToUpper and 140 ms for the StringExtensions.Contains():

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Globalization;

namespace ScratchConsole
{
    class Program
    {
    static void Main(string[] args)
    {
        string input = "";
        while (input != "exit")
        {
            RunTest();
            input = Console.ReadLine();
        }
    }

    static void RunTest()
    {
        List<string> s = new List<string>();
        string containsString = "1";
        bool found;
        DateTime now;
        for (int i = 0; i < 50000; i++)
        {
            s.Add("AAAAAAAAAAAAAAAA AAAAAAAAAAAA");
        }

        now = DateTime.Now;
        foreach (string st in s)
        {
            found = st.ToLower().Contains(containsString);
        }
        Console.WriteLine("ToLower(): " + (DateTime.Now - now).TotalMilliseconds);

        now = DateTime.Now;
        foreach (string st in s)
        {
            found = st.ToUpper().Contains(containsString);
        }
        Console.WriteLine("ToUpper(): " + (DateTime.Now - now).TotalMilliseconds);


        now = DateTime.Now;
        foreach (string st in s)
        {
            found = StringExtensions.Contains(st, containsString, StringComparison.OrdinalIgnoreCase);
        }
        Console.WriteLine("StringExtensions.Contains(): " + (DateTime.Now - now).TotalMilliseconds);

    }
}

public static class StringExtensions
{
    public static bool Contains(this string source, string toCheck, StringComparison comp)
    {
        return source.IndexOf(toCheck, comp) >= 0;
    }
}

}

Upvotes: 15

Answers (3)

aepheus

Reputation: 8187

Since ToUpper would actually result in a new string being created, StringComparison.OrdinalIgnoreCase would be faster, also, regex has a lot of overhead for a simple compare like this. That said, String.IndexOf(String, StringComparison.OrdinalIgnoreCase) should be the fastest, since it does not involve creating new strings.

I would guess (there I go again) that RegEx has the better worst case because of how it evaluates the string, IndexOf will always do a linear search, I'm guessing (and again) that RegEx is using something a little better. RegEx should also have a best case which would likely be close, though not as good, as IndexOf (due to additional complexity in it's language).

15,000 length string, 10,000 loop

00:00:00.0156251 IndexOf-OrdinalIgnoreCase
00:00:00.1093757 RegEx-IgnoreCase 
00:00:00.9531311 IndexOf-ToUpper 
00:00:00.9531311 IndexOf-ToLower

Placement in the string also makes a huge difference:

At start:
00:00:00.6250040 Match
00:00:00.0156251 IndexOf
00:00:00.9687562 ToUpper
00:00:01.0000064 ToLower

At End:
00:00:00.5781287 Match
00:00:01.0468817 IndexOf
00:00:01.4062590 ToUpper
00:00:01.4218841 ToLower

Not Found:
00:00:00.5625036 Match
00:00:01.0000064 IndexOf
00:00:01.3750088 ToUpper
00:00:01.3906339 ToLower

Upvotes: 18

Senad Meškin

Reputation: 13756

This was interesting question to me, so I have created a little test using different methods

string content = "";
            for (var i = 0; i < 10000; i++)
                content = String.Format("{0} asldkfjalskdfjlaskdfjalskdfj laksdf lkwiuirh 9238 r9849r8 49834", content);

            string test = String.Format("{0} find_me {0}", content);

            string search = test;

            var tickStart = DateTime.Now.Ticks;
            //6ms
            //var b = search.ToUpper().Contains("find_me".ToUpper());

            //2ms
            //Match m = Regex.Match(search, "find_me", RegexOptions.IgnoreCase);


            //a little bit over 1ms
            var c = false;
            if (search.Length == search.ToUpper().Replace("find_me".ToUpper(), "x").Length)
                c = true;
            var tickEnd = DateTime.Now.Ticks;
            Debug.Write(String.Format("{0} {1}", tickStart, tickEnd));

So what I ahve done is created a string and searched in it

first method search.ToUpper().Contains("find_me".ToUpper()) 5ms

second method Match m = Regex.Match(search, "find_me", RegexOptions.IgnoreCase) 2ms

third method

if (search.Length == search.ToUpper().Replace("find_me".ToUpper(), "x").Length)
                c = true;

it took little more than 1ms

Upvotes: 0

Joe Mancuso

Reputation: 2129

I have found that a compiled RegEx is by far the fastest solution and is obviously much more versatile. Compiling it helps put it on par with smaller string comparisons and as you stated, there is no comparison with larger strings.

http://www.dijksterhuis.org/regular-expressions-advanced/ contains some hints to gain maximum speed from RegEx comparisons; you might find it helpful.

Upvotes: 2

What is the fastest, case insensitive, way to see if a string contains another string in C#?

Answers (3)

Related Questions