Reputation: 486
In Pandas, I have a DataFrame with datetime
s in a column (not the index), which span several days, and are at irregular time intervals (i.e. not periodic). I want to return the first value from each day. So if my datetime
column looked like:
2013-01-01 01:00
2013-01-01 05:00
2013-01-01 14:00
2013-01-02 01:00
2013-01-02 05:00
2013-01-04 14:00
The command I'm looking for would return the dataframe columns for the following indexes:
2013-01-01 01:00
2013-01-02 01:00
2013-01-04 14:00
Upvotes: 4
Views: 1202
Reputation: 879093
With this setup:
import pandas as pd
data = '''\
2013-01-01 01:00
2013-01-01 05:00
2013-01-01 14:00
2013-01-02 01:00
2013-01-02 05:00
2013-01-04 14:00'''
dates = pd.to_datetime(data.splitlines())
df = pd.DataFrame({'date': dates, 'val': range(len(dates))})
>>> df
date val
0 2013-01-01 01:00:00 0
1 2013-01-01 05:00:00 1
2 2013-01-01 14:00:00 2
3 2013-01-02 01:00:00 3
4 2013-01-02 05:00:00 4
5 2013-01-04 14:00:00 5
You can produce the desired DataFrame using groupby
and agg
:
grouped = df.groupby([d.strftime('%Y%m%d') for d in df['date']])
newdf = grouped.agg('first')
print(newdf)
yields
date val
20130101 2013-01-01 01:00:00 0
20130102 2013-01-02 01:00:00 3
20130104 2013-01-04 14:00:00 5
Upvotes: 2