CollaborativeLearner
CollaborativeLearner

Reputation: 71

Rust - How can I search a vec for a subset - and find the start index of the subvec?

If I have a vec, how can I search this to find whether it contains another vec - and return the index where this subvec begins?

let mut haystack = vec!(0, 0, 0, 1, 48, 120, 49, 49, 1, 0);
let mut needle = vec!(48, 120, 49, 49);

Such that it returns 4 (the starting index of this subset in the original) (or rather, it would return a Result which contains 4 in this case - and which would error if it is not found at all).

Upvotes: 2

Views: 2091

Answers (2)

Martin Berglund
Martin Berglund

Reputation: 31

I would use the windows iterator like:

fn find(haystack: &Vec<i32>, needle: &Vec<i32>) -> Option<usize> {
    for (position, window) in haystack.windows(needle.len()).enumerate() {
        if window == needle {
            return Some(position);
        }
    }
    None
}

Upvotes: 2

jferard
jferard

Reputation: 8190

This is a classic string search problem. @Willem Van Onsem suggested the KMP algorithm, but you should start with the naive algorithm.

For every index of haystack, try to compare the string of the same length as needle and starting at this index in haystack to needle itself.

Have a look at this:

0   0   0   1   48  120 49  49  1   0
48  120 49  49
x fail
    48  120 49  49
    x fail
        48  120 49  49
        x fail
            48  120 49  49
            x fail
                48  120 49  49
                -   -   -   - match!

x means that the elements are different, - that they are the same. On every x, shift to the next element of haystack (that's the difference with KMP which may shift more than one element at once).

In Rust, you will write something like:

fn find1(haystack: &Vec<i32>, needle: &Vec<i32>) -> i64 {
    for i in 0..haystack.len()-needle.len()+1 { // last indices of haystack are too far right to get a match
        let mut j = 0;
        while j < needle.len() { // check every char of needle
            if needle[j] != haystack[i+j] { // doesn't match
                break; // try the next i
            }
            j += 1; // else: match so far
        }
        if j == needle.len() { // no break: a full match was found
            return i as i64;
        }
    }
    return -1; // not a single full match
}

Of course, you can use some of Rust features to shorten the code (and avoid the C-like style above):

fn find2(haystack: &Vec<i32>, needle: &Vec<i32>) -> Option<usize> {
    for i in 0..haystack.len()-needle.len()+1 {
        if haystack[i..i+needle.len()] == needle[..] {
            return Some(i);
        }
    }
    None
}

Or the functional style if you prefer:

fn find3(haystack: &Vec<i32>, needle: &Vec<i32>) -> Option<usize> {
    (0..haystack.len()-needle.len()+1)
        .filter(|&i| haystack[i..i+needle.len()] == needle[..]).next()
}

If you understand the naive algorithm and its naive implementation, you can move to faster algorithms.

Upvotes: 3

Related Questions