Reputation: 1216
Assume we have below lists:
List<int> Journey1 = new List<int>() { 1, 2, 3, 4, 5 };
List<int> Journey2 = new List<int>() { 2, 3, 4, 6, 7, 3, 4 };
List<int> Journey3 = new List<int>() { 6, 7, 1 };
List<int> Journey4 = new List<int>() { 3, 1, 4 };
And the patterns are:
2, 3, 4 -> Journey1, Journey2;
6, 7 -> Journey2, Journey3;
1 -> Journey2, Journey3, Journey4;
5 -> Journey1;
3, 4 -> Journey2;
3 -> Journey4;
4 -> Journey4;
We have 5000 lists, and each has around 200 items, so the patterns can have between 1-200 items and can be seen in 1-5000 lists.
Therefore I need very fast way of pattern matching.
Upvotes: 0
Views: 2029
Reputation:
In your case there are opportunities to benefit from pattern preprocessing as well as text preprocessing (http://en.wikipedia.org/wiki/String_searching_algorithm).
For instance, constructing a trie for all subsequences in a list will allow to query this list for a given pattern in time proportional to the pattern length.
Upvotes: 0
Reputation: 100545
For brute force approach you can try to use polynomial hash-functions to speed up sub-section matches. Still insane number of comparisons required, but at least match could be almost constant irrespective of sub-sequence length.
Upvotes: 0
Reputation: 112592
I am not sure what you want as output. I just made a Try.
I suggest that you make a list of lists, instead of declaring individual list variables.
List<List<int>> journeys = new List<List<int>>();
journeys.Add(new List<int>() { 1, 2, 3, 4, 5 });
journeys.Add(new List<int>() { 2, 3, 4, 6, 7, 3, 4 });
journeys.Add(new List<int>() { 6, 7, 1 });
journeys.Add(new List<int>() { 3, 1, 4 });
I assumed that the numbers range from 0 to 255. With this query
var result = Enumerable.Range(0, 256)
.Select(number => new
{
number,
listIndexes = journeys
.Select((list, index) => new { index, list })
.Where(a => a.list.Contains(number))
.Select(a => a.index)
.ToList()
})
.Where(b => b.listIndexes.Count > 0)
.ToList();
and this test loop
foreach (var item in result) {
Console.Write("Number {0} occurs in list # ", item.number);
foreach (var index in item.listIndexes) {
Console.Write("{0} ", index);
}
Console.WriteLine();
}
you will get this result
Number 1 occurs in list # 0 2 3 Number 2 occurs in list # 0 1 Number 3 occurs in list # 0 1 3 Number 4 occurs in list # 0 1 3 Number 5 occurs in list # 0 Number 6 occurs in list # 1 2 Number 7 occurs in list # 1 2
Where the lists are numbered starting at zero.
Upvotes: 1
Reputation: 3146
Without precomputation and with a naive on-the-fly search:
var matchedJourneys = journeys.Where(x => ContainsPattern(x, mypattern));
bool ContainsPattern(List<int> list, List<int> pattern)
{
for(int i = 0; i < list.Count - (pattern.Count - 1); i++)
{
var match = true;
for(int j = 0; j < pattern.Count; j++)
if(list[i + j] != pattern[j])
{
match = false;
break;
}
if(match) return true;
}
return false;
}
This will execute at max 200 million equals checks for your 'numbers'. But since checks are not expected to be executed for whole patterns, that could be (just a guess) ~5 million equals operations if checking all the lists. That's a few hundred milliseconds.
It all depends on what is 'very fast' for you. If that's too slow, you will need a much much more complicated approach ...
Upvotes: 2