euri10
euri10

Reputation: 2626

selecting last of day in a pandas index

I got a dataframe indexed with datetime index. That index contains several times the same dates, meaning same year, month and day, the hour may differ. I'd like to select only the last of each and every existing day in the index.

I think I'm on the right path, but I miss something in the logic....

so the dataframe dfmatches has this index:

In[166]: dfmatches.index
Out[165]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-03 16:58:49, ..., 2014-11-26 11:22:59]
Length: 597, Freq: None, Timezone: None

a sample of it would be :

2014-11-04 10:10:09    0.005169
2014-11-10 08:11:50    0.005169
2014-11-10 13:42:03    0.005169
2014-11-11 16:53:04    0.005169
2014-11-12 17:49:40    0.005169
2014-11-13 11:30:10    0.005169
2014-11-17 09:35:45    0.005169
2014-11-18 10:34:36    0.005169
2014-11-18 15:12:20    0.005169
2014-11-18 17:14:16    0.005169
2014-11-20 16:37:47    0.005169
2014-11-24 10:10:55    0.005169
2014-11-25 18:00:18    0.005169
2014-11-26 10:07:53    0.005169
2014-11-26 11:22:59    0.005169

I'd like to have only the last index of each day already existing in the index. So in the sample provided :

2014-11-04 10:10:09    0.005169
2014-11-10 13:42:03    0.005169
2014-11-11 16:53:04    0.005169
2014-11-12 17:49:40    0.005169
2014-11-13 11:30:10    0.005169
2014-11-17 09:35:45    0.005169
2014-11-18 17:14:16    0.005169
2014-11-20 16:37:47    0.005169
2014-11-24 10:10:55    0.005169
2014-11-25 18:00:18    0.005169
2014-11-26 11:22:59    0.005169

I tried to create a mask :

mask = (dfmatches.index.shift(1,'D').day > dfmatches.index.day)

then do a

dfmatches.loc[mask]

but that doesn't return what I want, I still have duplicates days...

I'm pretty sure the fail is in the mask logic I implement.....

maybe not, in any case hints are greatly appreciated !

Upvotes: 0

Views: 1451

Answers (2)

Matti John
Matti John

Reputation: 20467

If you groupby the index date, you can select the last item for each group, e.g.:

dfmatches.groupby(dfmatches.index.date).last()

Upvotes: 3

Bob Haffner
Bob Haffner

Reputation: 8483

I'm thinking you would need to separate your datetime into a date and a time

So this 2014-11-04 10:10:09 would become 2014-11-04 and 10:10:09

If you did that, you could do something like the following

df.sort(['date','time']).groupby('date').apply(lambda x : x[-1:])

Upvotes: 0

Related Questions