Reputation: 255
I am working on Data mining project and I've chosen Apriori algorithm for Association rules task. Briefly saying I am not satisfied with execution time the way I've implemented it. I will describe just problematic part of my code.
I have two Lists of lists.
List<List<int>> one;
List<List<int>> two;
I've to iterate through elements of list one
and check if one[i]
is subset of two[j]
foreach(List<int> items in one)
{
foreach(List<int> items2 in two)
{
if(items2.ContainsSetOf(items1))
{
//do something
}
}
I was thinking if there are way to reduce execution time for such apporoach. (Parallel execution, using dictionaries, etc)
Do you guys have any idea how it's possible to reduce it?
Thanks!
Upvotes: 3
Views: 252
Reputation: 56
C# code snippet
var dict = new Dictionary<int, HashSet<List<int>>>();
foreach (List<int> list2 in two) {
foreach (int i in list2) {
if(dict.ContainsKey(i) == FALSE) {
//create empty HashSet dict[i]
dict.Add(i, new HashSet<List<int>>());
}
//add reference to list2 to the HashSet dict[i]
dict[i].Add(list2);
}
}
foreach (List<int> list1 in one) {
HashSet<List<int>> listsInTwoContainingList1 = null;
foreach (int i in list1) {
if (listsInTwoContainingList1 == null) {
listsInTwoContainingList1 = new HashSet<List<int>>(dict[i]);
} else {
listsInTwoContainingList1.IntersectWith(dict[i]);
}
if(listsInTwoContainingList1.Count == 0) { //optimization :p
break;
}
}
foreach (List<int> list2 in listsInTwoContainingList1) {
//list2 contains list1
//do something
}
}
Example
L2= {
L2a = {10, 20, 30, 40}
L2b = {30, 40, 50, 60}
L2c = {10, 25, 30, 40}
}
L1 = {
L1a = {10, 30, 40}
L1b = {30, 25, 50}
}
After the first part of the code:
dict[10] = {L2a, L2c}
dict[20] = {L2a}
dict[25] = {L2c}
dict[30] = {L2a, L2b, L2c}
dict[40] = {L2a, L2b, L2c}
dict[50] = {L2c}
dict[60] = {L2c}
In the second part of the code:
L1a: dict[10] n dict[30] n dict[40] = {L2a, L2c}
L1b: dict[30] n dict[25] n dict[50] = { }
So L1a
is included in L2a
and L2c
, but L1b
in none.
Complexity
Now regarding the algorithm complexity, suppose L1
has n1
elements, L2
has n2
elements, the average number of elements of the sublists of L1
is m1
and the average number of elements of the sublists of L2
is m2
. Then:
the original solution is:
O(n1 x n2 x m1 x m2)
, if the containsSetOf method does a nested loop, or, at best, O(n1 x n2 x (m1 + m2))
, if it uses a HashSet. Is7aq's solution is also O(n1 x n2 x (m1 + m2))
.
the proposed solution is:
O(n2 x m2 + n1 x (m1 x nd + n2))
, where nd
is the average number of elements of the sets dict[i]
.
The efficiency of the proposed solution depends a lot on this nd
:
If nd
is large - close to n2
(when every integer is part of every sublist of L2
), then it is just as slow as the original one.
However, if nd
is expected to be small (i.e. the sublists of L2
are quite different from one another), then the proposed solution would typically be much faster, especially if n1
and n2
are large.
Upvotes: 1
Reputation: 2721
Make them lists of sets, and use set operations to find if a set of subset of another.
Example
HashSet<int> set1 = new HashSet<int>();
set1.Add(1);
set1.Add(2);
HashSet<int> set2 = new HashSet<int>();
set2.Add(1);
set2.Add(2);
set2.Add(3);
List<HashSet<int>> one = new List<HashSet<int>>();
one.add(set1);
one.add(set2);
List<HashSet<int>> two = new List<HashSet<int>>();
two.add(set1);
two.add(set2);
foreach(Set<int> setA in one) {
foreach(Set<int> setB in two) {
if(setA.IsSubsetOf(setB)) {
// do something
}
}
}
Upvotes: 4
Reputation: 15893
If you want to reduce the number of checks "Is list in list" (or is set a subset), one way is to build a hierarchy (tree) of lists. Of course, the performance improvement (if any) depends on the data - if none of the lists contain other lists, you will have to do all checks as now.
Upvotes: 1