Remove All Pandas Rows Except Each Column Closest The The Start of Hour

Question

I have a frame, df:

Date A B C  
x    1 1 1
y    1 1 1
z    1 1 1

The "Date" column is my index, and all timestamps are random times down to the second level. I want to remove all rows in the dataframe, except for the row that is the closest to the start of a new hour.

For example, if 12/15/16 15:16:12 is the earliest row in hour 15 of that date, I want every other row with a time stamp greater than that stamp to be deleted. I then want the process repeated for the next hour, and so on.

Is this possible in a fast manner in pandas?

Thanks

BENY · Accepted Answer

You can using groupby and head after sort_index

df.sort_index().groupby(df.index.strftime('%Y-%m-%d %H')).head(1)
Out[83]: 
                     A 
Date                   
2016-12-15 15:16:12   1

Data input

df
Out[84]: 
                     A 
Date                   
2016-12-15 15:16:12   1
2016-12-15 15:19:12   1
2016-12-15 15:56:12   1

Remove All Pandas Rows Except Each Column Closest The The Start of Hour

Answers (1)

Related Questions