Reputation: 143
I have the following type of muti-index dataframe:
import random
col3=[0,0,0,0,2,4,6,0,0,0,100,200,300,400]
col4=[0,0,0,0,4,6,8,0,0,0,200,900,400, 500]
d = {'Unit': [1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 6],
'Year': [2014, 2015, 2016, 2017, 2015, 2016, 2017, 2017, 2014, 2015, 2014, 2015, 2016, 2017], 'col3' : col3, 'col4' : col4 }
df = pd.DataFrame(data=d)
new_df = df.groupby(['Unit', 'Year']).sum()
col3 col4
Unit Year
1 2014 0 0
2015 0 0
2016 0 0
2017 0 0
2 2015 2 4
2016 4 6
2017 6 8
3 2017 0 0
4 2014 0 0
5 2015 0 0
6 2014 100 200
2015 200 900
2016 300 400
2017 400 500
In reality it is larger ofcourse, but this does the job. In this dataframe I want to remove all Units, which have only one year entry. So I want to have this:
col3 col4
Unit Year
1 2014 0 0
2015 0 0
2016 0 0
2017 0 0
2 2015 2 4
2016 4 6
2017 6 8
6 2014 100 200
2015 200 900
2016 300 400
2017 400 500
Thank you in advance for help,
Jen
Upvotes: 4
Views: 124
Reputation: 863216
Use GroupBy.transform
with any column and test counts with GroupBy.size
, compare for not equal by Series.ne
and filter by boolean indexing
:
df = new_df[new_df.groupby(level=0)['col3'].transform('size').ne(1)]
Or get values of index by Index.get_level_values
and filter by Index.duplicated
:
df = new_df[new_df.index.get_level_values(0).duplicated(keep=False)]
print (df)
col3 col4
Unit Year
1 2014 0 0
2015 0 0
2016 0 0
2017 0 0
2 2015 2 4
2016 4 6
2017 6 8
6 2014 100 200
2015 200 900
2016 300 400
2017 400 500
Upvotes: 6