Reputation: 2785
Say I have a list of numbers with duplicants.
import random
lst = [0, 0, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 8, 8, 9]
random.shuffle(lst)
I want to split the list into a minimum amount of sub"set"s with all unique numbers, without discarding any numbers. I managed to write the following code, but I feel like this is hard-coded, so there should be faster and more general solutions.
from collections import Counter
counter = Counter(lst)
maxcount = counter.most_common(1)[0][1]
res = []
while maxcount > 0:
res.append(set(x for x in lst if counter[x] >= maxcount))
maxcount -= 1
assert len([x for st in res for x in st]) == len(lst)
print(res)
Output:
[{4}, {8, 2, 4}, {0, 2, 3, 4, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}]
Obviously, this is only one of the solutions. Another solution could be
[{4, 9}, {8, 2, 4}, {0, 2, 3, 4, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8}]
I want to find all possible solutions with minimum amount of sub"set"s (4 in this case). Note that same numbers are indistinguishable, e.g. [{1}, {1, 2}]
is the same solution as [{1, 2}, {1}]
for a list of [1, 2, 1]
.
Any suggestions?
Upvotes: 5
Views: 510
Reputation: 70602
This way takes time linear in the number of list elements, and its output is the same (same sets, in the same order) regardless of the input list's order. It's basically a more "eager" variation of your code:
def split(xs):
from collections import defaultdict
x2count = defaultdict(int)
result = []
for x in xs:
x2count[x] += 1
count = x2count[x]
if count > len(result):
result.append(set())
result[count - 1].add(x)
return result
Then, e.g.,
xs = [0, 0, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 8, 8, 9]
import random
random.shuffle(xs)
print(split(xs))
displays
[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 2, 3, 4, 7, 8}, {8, 2, 4}, {4}]
Finding all answers is bound to be annoying ;-) But straightforward enough. Once you know there are 4 sets in the result, then you have a hairy kind of Cartesian product to compute. You know that, e.g., 7 appears twice, so there are comb(4, 2) == 6
ways to pick the two result sets 7 goes in. For each of those ways, you know, e.g., that 8 appears 3 times, so there are comb(4, 3) == 4
ways to pick the three results sets 8 goes in. Now we're up to 6 * 4 = 24 partial results. Repeat similarly for all other original integers. itertools.combinations()
can be used to do the choosing.
Unclear: consider input [1, 1, 2]
. The output here is [{1, 2}, {1}]
. Do you, or do you not, consider that to be the same as [{1}, {1, 2}]
? That is, do you consider an output to be a sequence of sets (in which case they're different), or as a set of sets (in which case they're the same)? A straightforward Cartesian product takes the "it's a sequence" view.
Here's a way. As sketched, it computes a Cartesian product of all ways of distributing each element across the number of required output sets. But rather than use itertools.product()
for this, it does it recursively, one element at a time. This allows it to check partial results so far for isomorphism, and decline to extend any partial solution that's isomorphic to a partial solution it already extended.
Toward that end, it views a partial solution as a set of sets. For technical reasons, Python requires using a frozenset
for a set that'a going to be used in turn as a set element.
Caution: this generator yields the same result
object every time. That's for efficiency. If you don't like that, you could, e.g., replace
yield result
with
yield result[:]
instead.
EDIT: note that I replaced the line
sig = frozenset(map(frozenset, result))
with
sig = frozenset(Counter(map(frozenset, result)).items())
That's because you really aren't viewing the result as a set of sets, but as a multiset of sets (a given set can appear more than once in a result, and the number of times it appears is significant). In fancier test cases than were given here, that can make a real difference.
A Counter
is the closest thing Python has to a builtin multiset type, but there is no "frozen" Counter
workalike akin to frozensets. So instead we turn the Counter
into a sequence of 2-tuples, and put those tuples into a frozenset. By using (set, count)
pairs, this allows us to account for that the number of times a set appears in a result is significant.
def allsplit(xs):
from collections import Counter
from itertools import combinations
c = Counter(xs)
n = max(c.values())
result = [set() for i in range(n)]
pairs = list(c.items())
pin = len(pairs)
def inner(pi):
if pi >= pin:
yield result
return
elt, count = pairs[pi]
seen = set()
for ixs in combinations(range(n), count):
for i in ixs:
result[i].add(elt)
sig = frozenset(Counter(map(frozenset, result)).items())
if sig not in seen:
yield from inner(pi + 1)
seen.add(sig)
for i in ixs:
result[i].remove(elt)
return inner(0)
Example:
>>> for x in allsplit([1, 1, 2, 3, 8, 4, 4]):
... print(x)
[{1, 2, 3, 4, 8}, {1, 4}]
[{1, 2, 3, 4}, {8, 1, 4}]
[{1, 2, 4, 8}, {1, 3, 4}]
[{1, 2, 4}, {1, 3, 4, 8}]
For your original example, it finds 36992 unique ways to partition the input.
Upvotes: 8
Reputation: 156
My simple solution, returning sets from biggest to smallest:
# Problem definition
import random
lst = [0, 0, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 8, 8, 9]
random.shuffle(lst)
# Divide in sets, biggest first
l = lst.copy()
sets = []
while l:
sets.append(set(l))
for item in sets[-1]:
l.remove(item)
Now the hard part, combinations. I suggest starting from something similar and expanding, what follows is just a proof of concept and avoiding duplication other than the trivial "move the entire difference set across" is not covered. The real implementation will cover all combinations of set -> set transfers (just add another itertools.combinations level) but I have no idea how to handle moving element from and to different sets in parallel in a clever way off the top of my head.
import itertools
more_sets = [sets]
diff_0_1 = sets[0] - sets[1]
for comb_size in range(1, len(diff_0_1)):
for comb in itertools.combinations(diff_0_1, comb_size):
s0 = sets[0] - set(comb)
s1 = sets[1] | set(comb)
more_sets.append([s0, s1] + sets[2:])
for some_sets in more_sets:
print(some_sets)
The above code returns this:
~ python3.8 tmp.py
[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 2, 3, 4, 7, 8}, {8, 2, 4}, {4}]
[{0, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 1, 2, 3, 4, 7, 8}, {8, 2, 4}, {4}]
[{0, 1, 2, 3, 4, 6, 7, 8, 9}, {0, 2, 3, 4, 5, 7, 8}, {8, 2, 4}, {4}]
[{0, 1, 2, 3, 4, 5, 7, 8, 9}, {0, 2, 3, 4, 6, 7, 8}, {8, 2, 4}, {4}]
[{0, 1, 2, 3, 4, 5, 6, 7, 8}, {0, 2, 3, 4, 7, 8, 9}, {8, 2, 4}, {4}]
[{0, 2, 3, 4, 6, 7, 8, 9}, {0, 1, 2, 3, 4, 5, 7, 8}, {8, 2, 4}, {4}]
[{0, 2, 3, 4, 5, 7, 8, 9}, {0, 1, 2, 3, 4, 6, 7, 8}, {8, 2, 4}, {4}]
[{0, 2, 3, 4, 5, 6, 7, 8}, {0, 1, 2, 3, 4, 7, 8, 9}, {8, 2, 4}, {4}]
[{0, 1, 2, 3, 4, 7, 8, 9}, {0, 2, 3, 4, 5, 6, 7, 8}, {8, 2, 4}, {4}]
[{0, 1, 2, 3, 4, 6, 7, 8}, {0, 2, 3, 4, 5, 7, 8, 9}, {8, 2, 4}, {4}]
[{0, 1, 2, 3, 4, 5, 7, 8}, {0, 2, 3, 4, 6, 7, 8, 9}, {8, 2, 4}, {4}]
[{0, 2, 3, 4, 7, 8, 9}, {0, 1, 2, 3, 4, 5, 6, 7, 8}, {8, 2, 4}, {4}]
[{0, 2, 3, 4, 6, 7, 8}, {0, 1, 2, 3, 4, 5, 7, 8, 9}, {8, 2, 4}, {4}]
[{0, 2, 3, 4, 5, 7, 8}, {0, 1, 2, 3, 4, 6, 7, 8, 9}, {8, 2, 4}, {4}]
[{0, 1, 2, 3, 4, 7, 8}, {0, 2, 3, 4, 5, 6, 7, 8, 9}, {8, 2, 4}, {4}]
Upvotes: 1
Reputation: 54148
I'd suggest using a prefilled list then store each value in separate buckets
import random
from collections import Counter
lst = [0, 0, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 8, 8, 9]
random.shuffle(lst)
c = Counter(lst)
maxcount = c.most_common(1)[0][1]
result = [set() for _ in range(maxcount)]
for k, v in c.items():
for i in range(v):
result[i].add(k)
print(result)
Can also be achieved with a defaultdict
c = Counter(lst)
result = defaultdict(set)
for k, v in c.items():
for i in range(v):
result[i].add(k)
result = list(result.values())
print(result)
Note on performance
from timeit import timeit
import numpy as np
lst = list(np.random.randint(0, 100, 10000))
nb = 1000
print(timeit(lambda: prefilled_list(lst), number=nb)) # 2.144
print(timeit(lambda: default_dict_set(lst), number=nb)) # 1.903
print(timeit(lambda: op_while_loop(lst), number=nb)) # 318.2
Upvotes: 4