Looking for byte[] in larger byte[] similar to String's IndexOf()?

Question

Say I have a large byte[] and I'm not only looking to see if, but also where, a smaller byte[] is in the larger array. For example:

byte[] large = new byte[100];
for (byte i = 0; i < 100; i++) {
    large[i] = i;
}
byte[] small = new byte[] { 23, 24, 25 };

int loc = large.IndexOf(small); // this is what I want to write

I guess I'm asking about looking for a sequence of any type (primitive or otherwise) within a larger sequence.

I faintly remember reading about a specific approach to this in strings, but I don't remember the name of the algorithm. I could easily write some way to do this, but I know there's a good solution and it's on the tip of my tongue. If there's some .Net method that does this, I'll take that too (although I'd still appreciate the name of the searching algorithm for education's sake).

Sergey Kalinichenko · Accepted Answer

You can do it with LINQ, like this:

var res = Enumerable.Range(0, large.Length-1)
    .Cast()
    .FirstOrDefault(n => large.Skip(n.Value).Take(small.Length).SequenceEqual(small));
if (res != null) {
    Console.Println("Found at {0}", res.Value);
} else {
    Console.Println("Not found");
}

The approach is self-explanatory except for the Cast part: you need it to decide between finding the result at the initial location of the large array, when zero is returned, and not finding the result at all, when the return is null.

Here is a demo on ideone.

The complexity of the above is O(M*N), where M and N are lengths of the large and small arrays. If the large array is very long, and contains a significant number of "almost right" sub-sequences that match long prefixes of small, you may be better off implementing an advanced algorithm for searching sequences, such as the Knuth–Morris–Pratt (KMP) algorithm. The KMP algorithm speeds up the search by making an observation that when a mismatch occurs, the small sequence contains enough information on how far ahead you can move in the large sequence based on where in the small sequence is the first mismatch. A look-up table is prepared for the small sequence, and then that table is used throughout the search to decide how to advance the search point. The complexity of KMP is O(N+M). See the Wikipedia article linked above for pseudocode of the KMP algorithm.

Looking for byte[] in larger byte[] similar to String's IndexOf()?

Answers (2)

Related Questions

Looking for byte[] in larger byte[] similar to String&#39;s IndexOf()?

Answers (2)

Related Questions

Looking for byte[] in larger byte[] similar to String's IndexOf()?