Reut
Reut

Reputation: 1592

Select row based on the closest hour and date

I have two kind of data, one is dataframe and one is right now many lists contain data regard hours of image aqcuisition, so the data look like this:

the pandas dataframe (has dates and hours from 23/6 -04/07, everyday from 07:00-17:00):

>>>timestamp                    date             hour        person_1   person_2 ............
0 2019-06-23 07:00:00          2019-06-23       07:00:00     0.124      0.431
1 2019-06-23 07:03:00          2019-06-23       07:03:00     0.131      0.438
2 2019-06-23 07:06:00          2019-06-23       07:06:00     0.154      0.433
3 2019-06-23 07:09:00          2019-06-23       07:09:00     0.164      0.441
....
2 2019-07-04 16:57:00          2019-07-04       16:57:00     0.864      0.675
3 2019-07-04 17:00:00          2019-07-04       17:00:00     0.823      0.631

the hour each image was taken in format of lists (The number after the word Hour is representative of the date) :

#Hours that the image was taken
Hours23=['07:00','08:00','09:32','10:14','11:15','12:17','13:03','14:41','15:04','17:05']
Hours24=['07:00','08:13','09:02','10:09','11:02','12:03','14:09','16:00','17:00']
Hours25=['08:15','09:02','10:02','11:02','12:02','14:02','15:02','16:00','17:00']
...
Hours3=['07:00','08:02','09:02','10:02','11:02','12:02','13:03','14:03','15:02','16:01','17:00']
Hours4=['07:15','08:02','12:02','13:03']

I want to select rows in the dataframe base on those lists of hours, e.g, to select the the row that is the closest to the Hour of the image . For example, for 23/6, the first hour of image is 07:00, and than 08:00, then 09:32...
so I would like to have in my dataframe only the rows that their hour is the closest to the image hour.
I saw there are ways to filter it ,as mentioned here (Find closest row of DataFrame to given time in Pandas) but I want to take into account the fact that I have differnt hours for each date.

so results should look something like this:

>>>timestamp                    date             hour        person_1   person_2 ............
0 2019-06-23 07:00:00          2019-06-23       07:00:00     0.124      0.431
1 2019-06-23 08:01:00          2019-06-23       07:00:00     0.108      0.370
1 2019-06-23 09:32:00          2019-06-23       07:00:00     0.101      0.360
...

Upvotes: 2

Views: 833

Answers (1)

Chris
Chris

Reputation: 16137

Your list of times isn't an ideal format, so I'll show you the approach for one date, where the dataframe h23 is manually built. You could code this to build out a dataframe that holds all of the times if you want.

The basic premise is to make a dataframe h23 for example, with all of the timestamps of the photos. Use this to merge with the df timestamp, using pd.merge_asof which has the ability to do the closest time match. nearest means the actual time could be before of after, but there are options for backward and forward as well if the time needs to be the nearest time before the event.

import pandas as pd

df = pd.DataFrame({'timestamp': {0: '2019-06-23 07:00:00',
  1: '2019-06-23 07:03:00',
  2: '2019-06-23 07:06:00',
  3: '2019-06-23 07:09:00'},
 'date': {0: '2019-06-23', 1: '2019-06-23', 2: '2019-06-23', 3: '2019-06-23'},
 'hour': {0: '07:00:00', 1: '07:03:00', 2: '07:06:00', 3: '07:09:00'},
 'person_1': {0: 0.124, 1: 0.131, 2: 0.154, 3: 0.16399999999999998},
 'person_2': {0: 0.431, 1: 0.43799999999999994, 2: 0.433, 3: 0.441}})

df['timestamp'] = pd.to_datetime(df['timestamp'])

Hours23=['07:00','08:00','09:32','10:14','11:15','12:17','13:03','14:41','15:04','17:05']

h23 = pd.DataFrame({'Time':Hours23, 'Day':'2019-06-23'})

h23['timestamp'] = pd.to_datetime(h23['Day'] + " " + h23['Time'])

pd.merge_asof(h23,df, on='timestamp', direction='nearest').drop(columns=['Day','Time'])

Output

              timestamp      date       hour    person_1    person_2
0   2019-06-23 07:00:00 2019-06-23  07:00:00    0.124   0.431
1   2019-06-23 08:00:00 2019-06-23  07:09:00    0.164   0.441
2   2019-06-23 09:32:00 2019-06-23  07:09:00    0.164   0.441
3   2019-06-23 10:14:00 2019-06-23  07:09:00    0.164   0.441
4   2019-06-23 11:15:00 2019-06-23  07:09:00    0.164   0.441
5   2019-06-23 12:17:00 2019-06-23  07:09:00    0.164   0.441
6   2019-06-23 13:03:00 2019-06-23  07:09:00    0.164   0.441
7   2019-06-23 14:41:00 2019-06-23  07:09:00    0.164   0.441
8   2019-06-23 15:04:00 2019-06-23  07:09:00    0.164   0.441
9   2019-06-23 17:05:00 2019-06-23  07:09:00    0.164   0.441

Upvotes: 3

Related Questions