silencer
silencer

Reputation: 2435

How to access last element of a multi-index dataframe

I have a dataframe with IDs and timestamps as a multi-index. The index in the dataframe is sorted by IDs and timestamps and I want to pick the lastest timestamp for each IDs. for example:

IDs    timestamp     value
0      2010-10-30     1
       2010-11-30     2
1      2000-01-01     300
       2007-01-01     33
       2010-01-01     400
2      2000-01-01     11

So basically the result I want is

IDs    timestamp    value
0      2010-11-30   2
1      2010-01-01   400
2      2000-01-01   11

What is the command to do that in pandas?

Upvotes: 6

Views: 2941

Answers (2)

Tai
Tai

Reputation: 7994

One can also use

df.groupby("IDs").tail(1)

This will take the last row of each label in level "IDs" and will not ignore NaN values.

Upvotes: 4

unutbu
unutbu

Reputation: 880767

Given this setup:

import pandas as pd
import numpy as np
import io

content = io.BytesIO("""\
IDs    timestamp     value
0      2010-10-30     1
0      2010-11-30     2
1      2000-01-01     300
1      2007-01-01     33
1      2010-01-01     400
2      2000-01-01     11""")

df = pd.read_table(content, header=0, sep='\s+', parse_dates=[1])
df.set_index(['IDs', 'timestamp'], inplace=True)

using reset_index followed by groupby

df.reset_index(['timestamp'], inplace=True)
print(df.groupby(level=0).last())

yields

              timestamp  value
IDs                           
0   2010-11-30 00:00:00      2
1   2010-01-01 00:00:00    400
2   2000-01-01 00:00:00     11

This does not feel like the best solution, however. There should be a way to do this without calling reset_index...


As you point out in the comments, last ignores NaN values. To not skip NaN values, you could use groupby/agg like this:

df.reset_index(['timestamp'], inplace=True)
grouped = df.groupby(level=0)
print(grouped.agg(lambda x: x.iloc[-1]))

Upvotes: 5

Related Questions