How to get all last rows at second level in MultiIndex DataFrame whose second level has variable length

Question

I have this dataframe:

df = pd.DataFrame({
    'thread_id': [0,0,1,1,1,2,2],
    'message_id_in_thread': [0,1,0,1,2,0,1],
    'text': ['txt0', 'txt1', 'txt2', 'txt3', 'txt4', 'txt5', 'txt6']
}).set_index(['thread_id', 'message_id_in_thread'])

And I want to keep all the last second level rows, meaning that:

For thread_id==0 I want to keep the row message_id_in_thread==1
For thread_id==1 I want to keep the row message_id_in_thread==2
For thread_id==2 I want to keep the row message_id_in_thread==1

This can easily be achieved by doing df.iterrows(), but I would like to know if there is any direct indexing method.

I look for something like df.loc[(:, -1)], which selects from all (:) level 1 groups, the last (-1) row of that block/group, but obviously this does not work.

jezrael · Accepted Answer

If need both levels use GroupBy.tail:

df = df.groupby(level=0).tail(1)
print (df)
                                text
thread_id message_id_in_thread      
0         1                     txt1
1         2                     txt4
2         1                     txt6

If need only first level use GroupBy.last or GroupBy.nth:

df = df.groupby(level=0).last()
#df = df.groupby(level=0).nth(-1)
print (df)
           text
thread_id      
0          txt1
1          txt4
2          txt6

How to get all last rows at second level in MultiIndex DataFrame whose second level has variable length

Answers (1)

Related Questions