Reputation: 943
I have a frame, df:
Date A B C
x 1 1 1
y 1 1 1
z 1 1 1
The "Date" column is my index, and all timestamps are random times down to the second level. I want to remove all rows in the dataframe, except for the row that is the closest to the start of a new hour.
For example, if 12/15/16 15:16:12 is the earliest row in hour 15 of that date, I want every other row with a time stamp greater than that stamp to be deleted. I then want the process repeated for the next hour, and so on.
Is this possible in a fast manner in pandas?
Thanks
Upvotes: 0
Views: 136
Reputation: 323376
You can using groupby
and head
after sort_index
df.sort_index().groupby(df.index.strftime('%Y-%m-%d %H')).head(1)
Out[83]:
A
Date
2016-12-15 15:16:12 1
Data input
df
Out[84]:
A
Date
2016-12-15 15:16:12 1
2016-12-15 15:19:12 1
2016-12-15 15:56:12 1
Upvotes: 2