Sarthak Maharana
Sarthak Maharana

Reputation: 11

Creating sub-lists

Let's say I have a list A of size 285. The first sub-list must have elements of A with size 228 (80% of 285). The second, of size 10% of A. The third, of size 10% of A. There should not be any common element at all. The entire process is randomized.

I'm aware of random.choices() and random.sample() but I en dup having common elements.

Upvotes: 1

Views: 70

Answers (3)

DarrylG
DarrylG

Reputation: 17166

We can use a technique commonly used in machine learning to partition data into training and test datasets.

Steps are:

  • Use random.shuffle to create a random ordering of data
  • Partition the shuffled data based upon sized of desired sublists

Code

import random

def partion_list(a):
  """Partiion list into sublists with 80%/10%/10% splits"""
  # Shallow copy of input list
  b = A[:] #shallow copy
  random.shuffle(b)  # inplace shuffle
  n = len(b)

  # Split with no common elements, but covers all the elements
  a1 = b[:int(0.8*n)]
  a2 = b[int(0.8*n):int(0.9*n)]
  a3 = b[int(0.9*n):]

  return a1, a2, a3

Test Code

A = list(range(285)) # test using list of numbers from 0 to 284
a1, a2, a3 = partion_list(A)

print('a1:', len(a1))
print('a2:', len(a2))
print('a3:', len(a3))

Output

a1: 228
a2: 28
a3: 29

Upvotes: 1

Karl Knechtel
Karl Knechtel

Reputation: 61643

If the order doesn't matter, it's simple: random.shuffle the entire list, and then take slices of the needed sizes.

If you need to pick out some elements and keep them in order, it gets trickier. The best I can think of is to just go through it mechanically: use random.sample to get the indices of the elements you want for the first sub-list; make that list; then remove those index positions and repeat for more sub-lists. To separate out the elements cleanly and avoid logic errors, we can use list comprehensions to build the sub-list as well as the new "remaining" pool. If you're using numpy, this can probably be done better with masks.

Upvotes: 1

Madhu Avinash
Madhu Avinash

Reputation: 971

Depending on the type of elements you can kind of put them in a hash map with a hashing algo of what you define.

Next iterate through the keys, and try to put them in your required sublists based on the count.

Upvotes: 1

Related Questions