masouduut94
masouduut94

Reputation: 1122

How to assign a column to dataframe as weights for each row and then sample the dataframe according to those weights?

I am trying to implement a weighted random selection in a dataframe. I used the code below to build the dataframe:

import pandas as pd
from numpy import exp
import random

moves = [(1, 2), (1, 3), (1, 4), (2, 1), (2, 3), (2, 4)]


data = {'moves': list(map(lambda i: moves[i] if divmod(i, len(moves))[0] != 1 else moves[divmod(i, len(moves))[1]],
                       [i for i in range(2 * len(moves))])),
    'player': list(map(lambda i: 1 if i >= len(moves) else 2,
                       [i for i in range(2 * len(moves))])),
    'wins': [random.randint(0, 2) for i in range(2 * len(moves))],
    'playout_number': [random.randint(0,1) for i in range(2 * len(moves))]
    }
frame = pd.DataFrame(data)

and then I created a list and inserted it as the new column 'weight':

total = sum(map(lambda a, b: exp(a/b) if b != 0 else 0, frame['wins'], frame['playout_number']))
weights = list(map(lambda a, b: exp(a/b) / total if b != 0 else 0, frame['wins'], frame['playout_number']))
frame = frame.assign(weight=weights)

Now I want to select a random row based on each row's weight in the new column inserted.
The problem is that I want to use pandas.DataFrame.sample(weights=weight), But I don't know how. I can do that with numpy.random.choice(weights=weights), But I'd prefer keep using pandas library functions.
I appreciate helps in advance.

Upvotes: 1

Views: 4570

Answers (1)

jezrael
jezrael

Reputation: 862601

You can use parameters n or frac with weights in sample.

Parameter weights can be array, so is possible use list:

df = frame.sample(n=1, weights=weights)

Or column of df (Series):

#select 1 row - n=1
df = frame.sample(n=1, weights=frame.weight)
print (df)
    moves  player  playout_number  wins    weight
6  (1, 2)       1               1     2  0.258325

#select 20% rows - frac=0.2 
df = frame.sample(frac=0.2, weights=frame.weight)
print (df)
    moves  player  playout_number  wins    weight
5  (2, 4)       2               1     2  0.221747
4  (2, 3)       2               1     1  0.081576

Upvotes: 4

Related Questions