How to sample a # of rows from a specific class in python?

Question

I want to sample 2 rows from "only" the class=1 in the "labels" column.

In my code you will see that:

1) I sample ALL rows from class=1 (4 rows)

2) Then I sample 2 rows from the previous dataframe

But I am sure there must be a better way to do this.

# Creation of the dataframe
df = pd.DataFrame(np.random.rand(12, 5))
label=np.array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])
df['label'] = label


# Sampling
df1=df.loc[df['label'] == 1] #Extract ALL samples with class=1
df2 = pd.concat(g.sample(2) for idx, g in df1.groupby('label')) #Extract 2 samples from df1
df2

piRSquared · Accepted Answer

I'd just do this:

df1.query('label == 1').sample(2)

How to sample a # of rows from a specific class in python?

Answers (2)

Explanation

Related Questions