eurojourney
eurojourney

Reputation: 101

Getting weighted random values from a list of lists with different list lengths

I need to create a new list that has random values pulled from a list of lists, where the secondary lists may be of different lengths.

Also, I need to take into account that, for example, if one of the secondary lists is larger than the rest, then the probabilities of obtaining a value from said list must be higher than that of the shorter secondary lists. Random values may be selected more than once, meaning I don't have to remove it from the list of lists after being chosen.

I was able to create the list of lists, where each secondary list corresponds to a region and its contents corresponds to client codes randomly generated, so far so good. But, when I use the function random.choice() to create my new list with random values, I get x amount of random lists from the lists available, rather than random values picked from ALL lists.

thislist = []

# So I have my blank list and I am ready to populate the list with, 
# in this case, 10 random values from the list of lists named 'codigo_cliente'

for i in range(10):
    thislist.append(random.choice(codigo_cliente))

Here are the client codes with 30 total clients in this example:

Clients Codes:

[['A-336', 'A-437', 'A-720', 'A-233', 'A-499'], 
 ['B-664', 'B-133', 'B-267', 'B-421', 'B-553', 'B-910', 'B-792', 'B-719', 'B-550', 'B-946'], 
 ['C-755', 'C-533', 'C-596', 'C-877', 'C-400', 'C-354', 'C-471', 'C-169', 'C-329', 'C-318', 'C-550', 'C-422', 'C-251', 'C-852', 'C-309']]

I am getting the following output, which is not what I want:

This is the random list of clients selected:

[['B-664', 'B-133', 'B-267', 'B-421', 'B-553', 'B-910', 'B-792', 'B-719', 'B-550', 'B-946'], 
 ['A-336', 'A-437', 'A-720', 'A-233', 'A-499'], 
 ['C-755', 'C-533', 'C-596', 'C-877', 'C-400', 'C-354', 'C-471', 'C-169', 'C-329', 'C-318', 'C-550', 'C-422', 'C-251', 'C-852', 'C-309'], 
 ['A-336', 'A-437', 'A-720', 'A-233', 'A-499'], 
 ['B-664', 'B-133', 'B-267', 'B-421', 'B-553', 'B-910', 'B-792', 'B-719', 'B-550', 'B-946'], 
 ['C-755', 'C-533', 'C-596', 'C-877', 'C-400', 'C-354', 'C-471', 'C-169', 'C-329', 'C-318', 'C-550', 'C-422', 'C-251', 'C-852', 'C-309'], 
 ['C-755', 'C-533', 'C-596', 'C-877', 'C-400', 'C-354', 'C-471', 'C-169', 'C-329', 'C-318', 'C-550', 'C-422', 'C-251', 'C-852', 'C-309'], 
 ['C-755', 'C-533', 'C-596', 'C-877', 'C-400', 'C-354', 'C-471', 'C-169', 'C-329', 'C-318', 'C-550', 'C-422', 'C-251', 'C-852', 'C-309'], 
 ['B-664', 'B-133', 'B-267', 'B-421', 'B-553', 'B-910', 'B-792', 'B-719', 'B-550', 'B-946'], 
 ['A-336', 'A-437', 'A-720', 'A-233', 'A-499']]

Instead, I should be getting something like, for example, the following:

thislist = ['A-336', 'B-553', 'C-596', 'B-910', 'C-251', 'C-329', 'B-910', 'A-437', 'B-946', 'C-251'] 

# Notice how there are more values with the "C" prefix from the larger secondary list,
# than values with the A or B prefixes from the smaller secondary lists.

Upvotes: 0

Views: 699

Answers (3)

jmm
jmm

Reputation: 394

Weighted Choice

random.choices(population, weights, k) takes a list of weights for your random selection. Therefore, you could give it the length of the sublists as weights:

weights = [len(c) for c in codigo_cliente]

and let it select a sublist for you (you can also tell it to select a sublist 10 times with k=10). From each of these sublists you can then select an arbitrary list element:

thislist = [random.choice(c) for c in random.choices(codigo_cliente, weights=weights, k=10)]

You can also pull it together for a one-liner solution:

thislist = [random.choice(c) for c in random.choices(codigo_cliente, weights=[len(c) for c in codigo_cliente], k=10)
]

Reference: A weighted version of random.choice

Flattened List

If you can afford the additional storage, you can flatten the list and do the selection on the flattened list like this:

import random
import itertools

codigo_cliente = [['A-336', 'A-437', 'A-720', 'A-233', 'A-499'],
                  ['B-664', 'B-133', 'B-267', 'B-421', 'B-553', 'B-910', 'B-792', 'B-719', 'B-550', 'B-946'],
                  [
                      'C-755', 'C-533', 'C-596', 'C-877', 'C-400', 'C-354', 'C-471', 'C-169', 'C-329', 'C-318',
                      'C-550', 'C-422', 'C-251', 'C-852', 'C-309'
                  ]]
thislist = []
temp = list(itertools.chain.from_iterable(codigo_cliente))

for i in range(10):
    thislist.append(random.choice(temp))

print(thislist)

Different approaches to flatten nested lists can be found here: How to make a flat list out of list of lists?

Upvotes: 1

RootTwo
RootTwo

Reputation: 4418

Use random.choices() with the weights argument set to the lengths of the lists. This selects the lists in proportion to their length. Then use random.choice() to select an element from each list. k is the number of items to select:

from random import choice, choices

w = [len(d) for d in codigo_cliente]
[choice(lst) for lst in choices(codigo_cliente, weights=w, k=10)]

Sample output:

['C-400', 'C-596', 'B-553', 'C-471', 'B-133',
 'C-596', 'B-133', 'A-499', 'C-471', 'C-400']

Upvotes: 1

Saleem Ali
Saleem Ali

Reputation: 1383

You are not picking random item from those nested list, but complete nested list.

First get the random nested list and then choose item randomly

for i in range(10):
    rand_list = random.choice(codigo_cliente)
    thislist.append(random.choice(rand_list))

Upvotes: 1

Related Questions