Reputation: 75
I'm trying to get out of a dataframe with more than 10.000 rows a new dataframe which exists of the rows that holds Customer-ID's which are in the top 3 most common of the existing dataframe.
So when my existing dataframe looks like this:
Customer-ID Name Order-ID Year
0 1 John 00001 2014
1 2 Doe 00002 2014
2 3 Erik 00003 2015
3 4 Paul 00004 2015
4 5 Karin 00005 2016
5 1 John 00006 2016
6 1 John 00007 2016
7 2 Doe 00008 2016
8 3 Erik 00009 2017
9 1 John 00010 2018
10 2 Doe 00011 2018
I want my new dataframe looks like this (Because Customer ID 1 occurs 4 times, ID 2 occurs 3 times and ID 3 occurs 2 times, which are the top 3 most common):
Customer-ID Name Order-ID Year
0 1 John 00001 2014
1 2 Doe 00002 2014
2 3 Erik 00003 2015
3 1 John 00006 2016
4 1 John 00007 2016
5 2 Doe 00008 2016
6 3 Erik 00009 2017
7 1 John 00010 2018
8 2 Doe 00011 2018
I hope that someone can help me with it, thanks in advance!
Upvotes: 2
Views: 581
Reputation: 323386
You can check value_counts
with isin
yourdf=df[df['Customer-ID'].isin(df['Customer-ID'].value_counts().head(3).index)]
yourdf
Out[67]:
Customer-ID Name Order-ID Year
0 1 John 1 2014
1 2 Doe 2 2014
2 3 Erik 3 2015
5 1 John 6 2016
6 1 John 7 2016
7 2 Doe 8 2016
8 3 Erik 9 2017
9 1 John 10 2018
10 2 Doe 11 2018
Upvotes: 8