Retriving certain amount of rows from dataframe

Question

I have a dataframe,

df = pd.DataFrame({"X1": ["A", "B", "A", "B", "B","C","C","C"],
"X2": ['FOO','BAR' ,'FOO1', 'BAR1', 'FOO2','BAR2','FOO3','BAR3']})

    X1  X2
0   A   FOO
1   B   BAR
2   A   FOO1
3   B   BAR1
4   B   FOO2
5   C   BAR2
6   C   FOO3
7   C   BAR3

Now I am doing the value counts which give A:2, B:3, C:3, and I want to extract the rows according to counts of A. So that, I can have a dataframe in which 2 rows of A, 2 rows of B and 2 rows of C.

So output should be,

    X1  X2
0   A   FOO
2   A   FOO1
1   B   BAR
3   B   BAR1
5   C   BAR2
6   C   FOO3

jezrael · Accepted Answer

Use GroupBy.head with count A values by sum compared values by Series.eq for == with sorting by column X1:

N = df['X1'].eq('A').sum()
df = df.sort_values('X1').groupby('X1').head(N)
print (df)
  X1    X2
0  A   FOO
2  A  FOO1
1  B   BAR
3  B  BAR1
5  C  BAR2
6  C  FOO3

Retriving certain amount of rows from dataframe

Answers (1)

Related Questions