Reputation: 11
Let's say I have a list A of size 285. The first sub-list must have elements of A with size 228 (80% of 285). The second, of size 10% of A. The third, of size 10% of A. There should not be any common element at all. The entire process is randomized.
I'm aware of random.choices() and random.sample() but I en dup having common elements.
Upvotes: 1
Views: 70
Reputation: 17166
We can use a technique commonly used in machine learning to partition data into training and test datasets.
Steps are:
Code
import random
def partion_list(a):
"""Partiion list into sublists with 80%/10%/10% splits"""
# Shallow copy of input list
b = A[:] #shallow copy
random.shuffle(b) # inplace shuffle
n = len(b)
# Split with no common elements, but covers all the elements
a1 = b[:int(0.8*n)]
a2 = b[int(0.8*n):int(0.9*n)]
a3 = b[int(0.9*n):]
return a1, a2, a3
Test Code
A = list(range(285)) # test using list of numbers from 0 to 284
a1, a2, a3 = partion_list(A)
print('a1:', len(a1))
print('a2:', len(a2))
print('a3:', len(a3))
Output
a1: 228
a2: 28
a3: 29
Upvotes: 1
Reputation: 61643
If the order doesn't matter, it's simple: random.shuffle
the entire list, and then take slices of the needed sizes.
If you need to pick out some elements and keep them in order, it gets trickier. The best I can think of is to just go through it mechanically: use random.sample
to get the indices of the elements you want for the first sub-list; make that list; then remove those index positions and repeat for more sub-lists. To separate out the elements cleanly and avoid logic errors, we can use list comprehensions to build the sub-list as well as the new "remaining" pool. If you're using numpy, this can probably be done better with masks.
Upvotes: 1
Reputation: 971
Depending on the type of elements you can kind of put them in a hash map with a hashing algo of what you define.
Next iterate through the keys, and try to put them in your required sublists based on the count.
Upvotes: 1