How do I delete rows which have only one entry in multi-index dataframe?

Question

I have the following type of muti-index dataframe:


import random
col3=[0,0,0,0,2,4,6,0,0,0,100,200,300,400]
col4=[0,0,0,0,4,6,8,0,0,0,200,900,400, 500]

d = {'Unit': [1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 6], 
 'Year': [2014, 2015, 2016, 2017, 2015, 2016, 2017, 2017, 2014, 2015, 2014, 2015, 2016, 2017], 'col3' : col3, 'col4' : col4 }
df = pd.DataFrame(data=d)
new_df = df.groupby(['Unit', 'Year']).sum()

           col3  col4     
Unit Year                      
1    2014     0     0      
     2015     0     0       
     2016     0     0      
     2017     0     0      
2    2015     2     4       
     2016     4     6  
     2017     6     8  
3    2017     0     0    
4    2014     0     0      
5    2015     0     0      
6    2014   100   200       
     2015   200   900  
     2016   300   400  
     2017   400   500

In reality it is larger ofcourse, but this does the job. In this dataframe I want to remove all Units, which have only one year entry. So I want to have this:

           col3  col4     
Unit Year                      
1    2014     0     0      
     2015     0     0       
     2016     0     0      
     2017     0     0      
2    2015     2     4       
     2016     4     6  
     2017     6     8         
6    2014   100   200       
     2015   200   900  
     2016   300   400  
     2017   400   500

Thank you in advance for help,

Jen

jezrael · Accepted Answer

Use GroupBy.transform with any column and test counts with GroupBy.size, compare for not equal by Series.ne and filter by boolean indexing:

df = new_df[new_df.groupby(level=0)['col3'].transform('size').ne(1)]

Or get values of index by Index.get_level_values and filter by Index.duplicated:

df = new_df[new_df.index.get_level_values(0).duplicated(keep=False)]

print (df)
           col3  col4
Unit Year            
1    2014     0     0
     2015     0     0
     2016     0     0
     2017     0     0
2    2015     2     4
     2016     4     6
     2017     6     8
6    2014   100   200
     2015   200   900
     2016   300   400
     2017   400   500

How do I delete rows which have only one entry in multi-index dataframe?

Answers (1)

Related Questions