Michael
Michael

Reputation: 486

Indexing pandas dataframe to return first data point from each day

In Pandas, I have a DataFrame with datetimes in a column (not the index), which span several days, and are at irregular time intervals (i.e. not periodic). I want to return the first value from each day. So if my datetime column looked like:

2013-01-01 01:00
2013-01-01 05:00
2013-01-01 14:00
2013-01-02 01:00
2013-01-02 05:00
2013-01-04 14:00

The command I'm looking for would return the dataframe columns for the following indexes:

2013-01-01 01:00
2013-01-02 01:00
2013-01-04 14:00

Upvotes: 4

Views: 1202

Answers (1)

unutbu
unutbu

Reputation: 879093

With this setup:

import pandas as pd
data = '''\
2013-01-01 01:00
2013-01-01 05:00
2013-01-01 14:00
2013-01-02 01:00
2013-01-02 05:00
2013-01-04 14:00'''
dates = pd.to_datetime(data.splitlines())
df = pd.DataFrame({'date': dates, 'val': range(len(dates))})

>>> df
                 date  val
0 2013-01-01 01:00:00    0
1 2013-01-01 05:00:00    1
2 2013-01-01 14:00:00    2
3 2013-01-02 01:00:00    3
4 2013-01-02 05:00:00    4
5 2013-01-04 14:00:00    5

You can produce the desired DataFrame using groupby and agg:

grouped = df.groupby([d.strftime('%Y%m%d') for d in df['date']])
newdf = grouped.agg('first')
print(newdf)

yields

                        date  val
20130101 2013-01-01 01:00:00    0
20130102 2013-01-02 01:00:00    3
20130104 2013-01-04 14:00:00    5

Upvotes: 2

Related Questions