Reputation: 1122
I am trying to implement a weighted random selection in a dataframe. I used the code below to build the dataframe:
import pandas as pd
from numpy import exp
import random
moves = [(1, 2), (1, 3), (1, 4), (2, 1), (2, 3), (2, 4)]
data = {'moves': list(map(lambda i: moves[i] if divmod(i, len(moves))[0] != 1 else moves[divmod(i, len(moves))[1]],
[i for i in range(2 * len(moves))])),
'player': list(map(lambda i: 1 if i >= len(moves) else 2,
[i for i in range(2 * len(moves))])),
'wins': [random.randint(0, 2) for i in range(2 * len(moves))],
'playout_number': [random.randint(0,1) for i in range(2 * len(moves))]
}
frame = pd.DataFrame(data)
and then I created a list and inserted it as the new column 'weight':
total = sum(map(lambda a, b: exp(a/b) if b != 0 else 0, frame['wins'], frame['playout_number']))
weights = list(map(lambda a, b: exp(a/b) / total if b != 0 else 0, frame['wins'], frame['playout_number']))
frame = frame.assign(weight=weights)
Now I want to select a random row based on each row's weight in the new column inserted.
The problem is that I want to use pandas.DataFrame.sample(weights=weight)
, But I don't know how. I can do that with numpy.random.choice(weights=weights)
, But I'd prefer keep using pandas library functions.
I appreciate helps in advance.
Upvotes: 1
Views: 4570
Reputation: 862601
You can use parameters n
or frac
with weights
in sample
.
Parameter weights
can be array
, so is possible use list
:
df = frame.sample(n=1, weights=weights)
Or column of df
(Series
):
#select 1 row - n=1
df = frame.sample(n=1, weights=frame.weight)
print (df)
moves player playout_number wins weight
6 (1, 2) 1 1 2 0.258325
#select 20% rows - frac=0.2
df = frame.sample(frac=0.2, weights=frame.weight)
print (df)
moves player playout_number wins weight
5 (2, 4) 2 1 2 0.221747
4 (2, 3) 2 1 1 0.081576
Upvotes: 4