arkadiy
arkadiy

Reputation: 766

How to randomly sample a set number of rows from a dataframe with a preset condition?

I have the following dataframe:

d = {'Pic': ['D1', 'D1', 'D2', 'D2', 'D3', 'D3', 'D4', 'D4'], 'Rating': [42, 54, 61, 72, 43, 52, 91, 22], 'Pair': [1, 2, 1, 2, 1, 2, 1, 2]}
df = pd.DataFrame(data=d)
df
  Pic  Rating  Pair
0  D1      42     1
1  D1      54     2
2  D2      61     1
3  D2      72     2
4  D3      43     1
5  D3      52     2
6  D4      91     1
7  D4      22     2

I need to select a subset of 2 rows such that there are 2 unique values from the Pic column but if a value is randomly picked (say, 'D1' is picked from the column, Pic), it's the corresponding pair is picked as well (so the row with D1 would be picked that has Pair 1, and also D1 that has Pair 2).

I tried the following:

df_Selected= df.sample(n=2, random_state=2)

But I am not sure how to make sure that each value that is randomly selected from the 'Pic' column, also has its pair selected from the 'Pair' column. So if the following row is randomly picked:

   Pic  Rating  Pair
0  D1      42     1

I would also need the following row to be randomly picked:

       Pic  Rating  Pair
    0  D1      42     2

Upvotes: 3

Views: 135

Answers (1)

Matthew Borish
Matthew Borish

Reputation: 3086

import pandas as pd
import random

d = {'Pic': ['D1', 'D1', 'D2', 'D2', 'D3', 'D3', 'D4', 'D4'],
     'Rating': [42, 54, 61, 72, 43, 52, 91, 22],
     'Pair': [1, 2, 1, 2, 1, 2, 1, 2]}

df = pd.DataFrame(data=d)

random_pic_list = random.sample(df['Pic'].unique().tolist(), 2)

df_slice = df[df['Pic'].isin(random_pic_list)]

print(df_slice)

    Pic Rating  Pair
0   D1  42  1
1   D1  54  2
2   D2  61  1
3   D2  72  2

Upvotes: 1

Related Questions