General Zlatan
General Zlatan

Reputation: 75

How to create new pandas dataframe from the top 10 of a certain column from existing dataframe

I'm trying to get out of a dataframe with more than 10.000 rows a new dataframe which exists of the rows that holds Customer-ID's which are in the top 3 most common of the existing dataframe.

So when my existing dataframe looks like this:

    Customer-ID  Name         Order-ID   Year
0   1            John         00001      2014
1   2            Doe          00002      2014
2   3            Erik         00003      2015
3   4            Paul         00004      2015
4   5            Karin        00005      2016
5   1            John         00006      2016
6   1            John         00007      2016
7   2            Doe          00008      2016
8   3            Erik         00009      2017
9   1            John         00010      2018
10  2            Doe          00011      2018

I want my new dataframe looks like this (Because Customer ID 1 occurs 4 times, ID 2 occurs 3 times and ID 3 occurs 2 times, which are the top 3 most common):

    Customer-ID  Name         Order-ID   Year
0   1            John         00001      2014
1   2            Doe          00002      2014
2   3            Erik         00003      2015
3   1            John         00006      2016
4   1            John         00007      2016
5   2            Doe          00008      2016
6   3            Erik         00009      2017
7   1            John         00010      2018
8   2            Doe          00011      2018

I hope that someone can help me with it, thanks in advance!

Upvotes: 2

Views: 581

Answers (1)

BENY
BENY

Reputation: 323386

You can check value_counts with isin

yourdf=df[df['Customer-ID'].isin(df['Customer-ID'].value_counts().head(3).index)]
yourdf
Out[67]: 
    Customer-ID  Name  Order-ID  Year
0             1  John         1  2014
1             2   Doe         2  2014
2             3  Erik         3  2015
5             1  John         6  2016
6             1  John         7  2016
7             2   Doe         8  2016
8             3  Erik         9  2017
9             1  John        10  2018
10            2   Doe        11  2018

Upvotes: 8

Related Questions