Ben
Ben

Reputation: 1216

Find Same patterns in lists

Assume we have below lists:

    List<int> Journey1 = new List<int>() { 1, 2, 3, 4, 5 };
    List<int> Journey2 = new List<int>() { 2, 3, 4, 6, 7, 3, 4 };
    List<int> Journey3 = new List<int>() { 6, 7, 1 };
    List<int> Journey4 = new List<int>() { 3, 1, 4 };

And the patterns are:

2, 3, 4 -> Journey1, Journey2;
6, 7    -> Journey2, Journey3;
1       -> Journey2, Journey3, Journey4;
5       -> Journey1;
3, 4    -> Journey2;
3       -> Journey4;
4       -> Journey4;

We have 5000 lists, and each has around 200 items, so the patterns can have between 1-200 items and can be seen in 1-5000 lists.

Therefore I need very fast way of pattern matching.

Upvotes: 0

Views: 2029

Answers (4)

user1196549
user1196549

Reputation:

In your case there are opportunities to benefit from pattern preprocessing as well as text preprocessing (http://en.wikipedia.org/wiki/String_searching_algorithm).

For instance, constructing a trie for all subsequences in a list will allow to query this list for a given pattern in time proportional to the pattern length.

Upvotes: 0

Alexei Levenkov
Alexei Levenkov

Reputation: 100545

For brute force approach you can try to use polynomial hash-functions to speed up sub-section matches. Still insane number of comparisons required, but at least match could be almost constant irrespective of sub-sequence length.

Upvotes: 0

Olivier Jacot-Descombes
Olivier Jacot-Descombes

Reputation: 112592

I am not sure what you want as output. I just made a Try.

I suggest that you make a list of lists, instead of declaring individual list variables.

List<List<int>> journeys = new List<List<int>>();
journeys.Add(new List<int>() { 1, 2, 3, 4, 5 });
journeys.Add(new List<int>() { 2, 3, 4, 6, 7, 3, 4 });
journeys.Add(new List<int>() { 6, 7, 1 });
journeys.Add(new List<int>() { 3, 1, 4 });

I assumed that the numbers range from 0 to 255. With this query

var result = Enumerable.Range(0, 256)
    .Select(number => new
    {
        number,
        listIndexes = journeys
            .Select((list, index) => new { index, list })
            .Where(a => a.list.Contains(number))
            .Select(a => a.index)
            .ToList()
    })
    .Where(b => b.listIndexes.Count > 0)
    .ToList();

and this test loop

foreach (var item in result) {
    Console.Write("Number {0} occurs in list # ", item.number);
    foreach (var index in item.listIndexes) {
        Console.Write("{0} ", index);
    }
    Console.WriteLine();
}

you will get this result

    Number 1 occurs in list # 0 2 3 
    Number 2 occurs in list # 0 1 
    Number 3 occurs in list # 0 1 3 
    Number 4 occurs in list # 0 1 3 
    Number 5 occurs in list # 0 
    Number 6 occurs in list # 1 2 
    Number 7 occurs in list # 1 2 

Where the lists are numbered starting at zero.

Upvotes: 1

doblak
doblak

Reputation: 3146

Without precomputation and with a naive on-the-fly search:

var matchedJourneys = journeys.Where(x => ContainsPattern(x, mypattern));

bool ContainsPattern(List<int> list, List<int> pattern)
{
    for(int i = 0; i < list.Count - (pattern.Count - 1); i++)
    {
        var match = true;
        for(int j = 0; j < pattern.Count; j++)  
            if(list[i + j] != pattern[j])
                     {
                         match = false;
                         break;
                     }
        if(match) return true;
    }
    return false;
}

This will execute at max 200 million equals checks for your 'numbers'. But since checks are not expected to be executed for whole patterns, that could be (just a guess) ~5 million equals operations if checking all the lists. That's a few hundred milliseconds.

It all depends on what is 'very fast' for you. If that's too slow, you will need a much much more complicated approach ...

Upvotes: 2

Related Questions