Reputation: 208
Some Background
I am working on a problem where I have sets stored in a hashmap with keys being the set name, i.e. Set1--> a,b,c,e,g .... Set2--> a,g,h,f ... Set3--> b,c,e ... etc.
The aim of the program is take a value from the user as a "threshold" i.e. 2, which is used a the minimum commonality between sets. if the threshold is met or exceeded, the program suggests a merge between the sets.
I have created a combination creator that will generate every possible combination between set names for comparison with order not considered i.e. (Set1, Set2,),(Set1,Set3),(Set2,Set3), (Set1,Set2,Set3).
These sets of combinations are then used to actually compare the sets. If the Threshold is met, this combination is store in a seperate list to output to the user as a possible merge. Before this is outputted, these is some logic to delete child combinations i.e. if (Set1,Set2,Set3) is a possible merge, then you can disregard, the other 3 child combinations as this super combination already covers it. we then output the suggested merges.
The Problem
When we reach a certain number of sets to compare i.e. above 17 let's say, we get an out of memory issue because there are millions of combinations being created. I would like your help on understanding alternative approaches or how we could improve this approach. It works but it's not efficient enough :(
Combination Creator
/**
* Iterates through the setsToBeCompared ArrayList and gets all the combinations
*
* @return - ArrayList with all the possible combinations
*/
public ArrayList<String> generateCombinations(ArrayList<String> setsToBeCompared) {
List<List<String>> temp = new ArrayList<>();
ArrayList<String> a = new ArrayList<>();
for (int i = 2; i <= 3; i++) {
temp = calculateCombinations(setsToBeCompared, i);
for (List<String> list : temp) {
a.add(list.toString());
}
}
return a;
}
/**
* Calculates all the combination given by the parameters
*
* @param values - the names of the sets to be compared
* @param size - where to start from
* @return - List of all possible calculated combinations
*/
private List<List<String>> calculateCombinations(List<String> values, int size) {
if (0 == size) {
return Collections.singletonList(Collections.<String>emptyList());
}
if (values.isEmpty()) {
return Collections.emptyList();
}
List<List<String>> combination = new LinkedList<List<String>>();
String actual = values.iterator().next();
List<String> subSet = new LinkedList<String>(values);
subSet.remove(actual);
List<List<String>> subSetCombination = calculateCombinations(subSet, size - 1);
for (List<String> set : subSetCombination) {
List<String> newSet = new LinkedList<String>(set);
newSet.add(0, actual);
combination.add(newSet);
}
combination.addAll(calculateCombinations(subSet, size));
return combination;
}
Upvotes: 0
Views: 198
Reputation: 14348
So, summarizing points I posted as comments.
In your case, generating all subsets of sets is definitely not an option since number of such subsets will be ~2N. For N = 50 it is more than the Earth exists in nanoseconds.
I assume to switch from subsets of sets to subsets of their items. Say, M
subsets have N
distinct items, and merge threshold is T
. So you need to try just ~NT k-combinations of size T
looking which subsets could be merged through that combination of items, which for small T
is acceptable.
Algorithm will be the following:
let D - collection of initial sets
let S - collection of distinct elements in sets across D
for each k-combination c over S {
M = new M(c) // merge object, which stores subset of merged sets and k-combination by which they are merged
for each (s in D) {
if (s.containsAll(c))
M.sets.add(s)
}
if (M.sets.size > 0) // some sets was merged through c
merges.add(M)
}
After that, taking all possible pairs of merges, remove those that are fully covered by other merges:
for each m in merges {
for each m1 in merges {
if (m.sets.containsAll(m1.sets))
m1.markDeleted()
}
}
Upvotes: 0
Reputation: 9096
How about something like this (will use much less memory but you still need to examine large number of values - 2^N )
import static java.util.stream.IntStream.range;
public class Subsets implements Iterator<List<Integer>> {
private final int level;
private final LinkedList<List<Integer>> queue = new LinkedList<>();
public Subsets(int level) {
this.level = level;
range(0, level).forEach(i -> queue.add(Arrays.asList(i)));
}
@Override
public boolean hasNext() {
return !queue.isEmpty();
}
public List<Integer> next() {
List<Integer> list = queue.removeFirst();
int maxValue = list.get(list.size() - 1);
if(list.size() < level) {
for (int k = maxValue+1; k < level; k++) {
List<Integer> newList = new ArrayList<>(list);
newList.add(k);
queue.addFirst(newList);
}
}
return list;
}
public static void main(String[] args) {
Subsets s4 = new Subsets(4);
while (s4.hasNext()) {
System.err.println(s4.next());
}
}
}
To use this you will need to map the names of your sets (keys) to integers. Sample output:
[0]
[0, 3]
[0, 2]
[0, 2, 3]
[0, 1]
[0, 1, 3]
[0, 1, 2]
[0, 1, 2, 3]
[1]
[1, 3]
[1, 2]
[1, 2, 3]
[2]
[2, 3]
[3]
Upvotes: 0