Reputation: 2626
I got a dataframe indexed with datetime index. That index contains several times the same dates, meaning same year, month and day, the hour may differ. I'd like to select only the last of each and every existing day in the index.
I think I'm on the right path, but I miss something in the logic....
so the dataframe dfmatches has this index:
In[166]: dfmatches.index
Out[165]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-03 16:58:49, ..., 2014-11-26 11:22:59]
Length: 597, Freq: None, Timezone: None
a sample of it would be :
2014-11-04 10:10:09 0.005169
2014-11-10 08:11:50 0.005169
2014-11-10 13:42:03 0.005169
2014-11-11 16:53:04 0.005169
2014-11-12 17:49:40 0.005169
2014-11-13 11:30:10 0.005169
2014-11-17 09:35:45 0.005169
2014-11-18 10:34:36 0.005169
2014-11-18 15:12:20 0.005169
2014-11-18 17:14:16 0.005169
2014-11-20 16:37:47 0.005169
2014-11-24 10:10:55 0.005169
2014-11-25 18:00:18 0.005169
2014-11-26 10:07:53 0.005169
2014-11-26 11:22:59 0.005169
I'd like to have only the last index of each day already existing in the index. So in the sample provided :
2014-11-04 10:10:09 0.005169
2014-11-10 13:42:03 0.005169
2014-11-11 16:53:04 0.005169
2014-11-12 17:49:40 0.005169
2014-11-13 11:30:10 0.005169
2014-11-17 09:35:45 0.005169
2014-11-18 17:14:16 0.005169
2014-11-20 16:37:47 0.005169
2014-11-24 10:10:55 0.005169
2014-11-25 18:00:18 0.005169
2014-11-26 11:22:59 0.005169
I tried to create a mask :
mask = (dfmatches.index.shift(1,'D').day > dfmatches.index.day)
then do a
dfmatches.loc[mask]
but that doesn't return what I want, I still have duplicates days...
I'm pretty sure the fail is in the mask logic I implement.....
maybe not, in any case hints are greatly appreciated !
Upvotes: 0
Views: 1451
Reputation: 20467
If you groupby
the index date, you can select the last item for each group, e.g.:
dfmatches.groupby(dfmatches.index.date).last()
Upvotes: 3
Reputation: 8483
I'm thinking you would need to separate your datetime into a date and a time
So this 2014-11-04 10:10:09 would become 2014-11-04 and 10:10:09
If you did that, you could do something like the following
df.sort(['date','time']).groupby('date').apply(lambda x : x[-1:])
Upvotes: 0