Aizzaac
Aizzaac

Reputation: 3318

How to sample a # of rows from a specific class in python?

I want to sample 2 rows from "only" the class=1 in the "labels" column.

In my code you will see that:

1) I sample ALL rows from class=1 (4 rows)

2) Then I sample 2 rows from the previous dataframe

But I am sure there must be a better way to do this.

# Creation of the dataframe
df = pd.DataFrame(np.random.rand(12, 5))
label=np.array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])
df['label'] = label


# Sampling
df1=df.loc[df['label'] == 1] #Extract ALL samples with class=1
df2 = pd.concat(g.sample(2) for idx, g in df1.groupby('label')) #Extract 2 samples from df1
df2

enter image description here

enter image description here

Upvotes: 1

Views: 2539

Answers (2)

Luan Souza
Luan Souza

Reputation: 175

TL;DR

df = df[df.label == '1'].sample(2)

Explanation

The step df.label == '1' will return list of boolean values corresponding to all rows where the label column is equal to '1'. In your example you have just the first 4 rows labeled as '1', so the returned list should be:

Index  Bool
0      True
1      True
2      True
3      True
4      False
5      False
6      False
...

When you pass it into the dataframe it will get only the samples where the indexes above are True:

df = df[df.label == '1'].sample(2)

Upvotes: 0

piRSquared
piRSquared

Reputation: 294258

I'd just do this:

df1.query('label == 1').sample(2)

enter image description here

Upvotes: 4

Related Questions