Reputation: 3318
I want to sample 2 rows from "only" the class=1 in the "labels" column.
In my code you will see that:
1) I sample ALL rows from class=1 (4 rows)
2) Then I sample 2 rows from the previous dataframe
But I am sure there must be a better way to do this.
# Creation of the dataframe
df = pd.DataFrame(np.random.rand(12, 5))
label=np.array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])
df['label'] = label
# Sampling
df1=df.loc[df['label'] == 1] #Extract ALL samples with class=1
df2 = pd.concat(g.sample(2) for idx, g in df1.groupby('label')) #Extract 2 samples from df1
df2
Upvotes: 1
Views: 2539
Reputation: 175
TL;DR
df = df[df.label == '1'].sample(2)
The step df.label == '1'
will return list of boolean values corresponding to all rows where the label column is equal to '1'. In your example you have just the first 4 rows labeled as '1', so the returned list should be:
Index Bool
0 True
1 True
2 True
3 True
4 False
5 False
6 False
...
When you pass it into the dataframe it will get only the samples where the indexes above are True
:
df = df[df.label == '1'].sample(2)
Upvotes: 0