Reputation: 1592
I have two kind of data, one is dataframe and one is right now many lists contain data regard hours of image aqcuisition, so the data look like this:
the pandas dataframe (has dates and hours from 23/6 -04/07, everyday from 07:00-17:00):
>>>timestamp date hour person_1 person_2 ............
0 2019-06-23 07:00:00 2019-06-23 07:00:00 0.124 0.431
1 2019-06-23 07:03:00 2019-06-23 07:03:00 0.131 0.438
2 2019-06-23 07:06:00 2019-06-23 07:06:00 0.154 0.433
3 2019-06-23 07:09:00 2019-06-23 07:09:00 0.164 0.441
....
2 2019-07-04 16:57:00 2019-07-04 16:57:00 0.864 0.675
3 2019-07-04 17:00:00 2019-07-04 17:00:00 0.823 0.631
the hour each image was taken in format of lists (The number after the word Hour is representative of the date) :
#Hours that the image was taken
Hours23=['07:00','08:00','09:32','10:14','11:15','12:17','13:03','14:41','15:04','17:05']
Hours24=['07:00','08:13','09:02','10:09','11:02','12:03','14:09','16:00','17:00']
Hours25=['08:15','09:02','10:02','11:02','12:02','14:02','15:02','16:00','17:00']
...
Hours3=['07:00','08:02','09:02','10:02','11:02','12:02','13:03','14:03','15:02','16:01','17:00']
Hours4=['07:15','08:02','12:02','13:03']
I want to select rows in the dataframe base on those lists of hours, e.g, to select the the row that is the closest to the Hour of the image .
For example, for 23/6, the first hour of image is 07:00, and than 08:00, then 09:32...
so I would like to have in my dataframe only the rows that their hour is the closest to the image hour.
I saw there are ways to filter it ,as mentioned here (Find closest row of DataFrame to given time in Pandas) but I want to take into account the fact that I have differnt hours for each date.
so results should look something like this:
>>>timestamp date hour person_1 person_2 ............
0 2019-06-23 07:00:00 2019-06-23 07:00:00 0.124 0.431
1 2019-06-23 08:01:00 2019-06-23 07:00:00 0.108 0.370
1 2019-06-23 09:32:00 2019-06-23 07:00:00 0.101 0.360
...
Upvotes: 2
Views: 833
Reputation: 16137
Your list of times isn't an ideal format, so I'll show you the approach for one date, where the dataframe h23
is manually built. You could code this to build out a dataframe that holds all of the times if you want.
The basic premise is to make a dataframe h23
for example, with all of the timestamps of the photos. Use this to merge with the df timestamp, using pd.merge_asof
which has the ability to do the closest time match. nearest
means the actual time could be before of after, but there are options for backward
and forward
as well if the time needs to be the nearest time before the event.
import pandas as pd
df = pd.DataFrame({'timestamp': {0: '2019-06-23 07:00:00',
1: '2019-06-23 07:03:00',
2: '2019-06-23 07:06:00',
3: '2019-06-23 07:09:00'},
'date': {0: '2019-06-23', 1: '2019-06-23', 2: '2019-06-23', 3: '2019-06-23'},
'hour': {0: '07:00:00', 1: '07:03:00', 2: '07:06:00', 3: '07:09:00'},
'person_1': {0: 0.124, 1: 0.131, 2: 0.154, 3: 0.16399999999999998},
'person_2': {0: 0.431, 1: 0.43799999999999994, 2: 0.433, 3: 0.441}})
df['timestamp'] = pd.to_datetime(df['timestamp'])
Hours23=['07:00','08:00','09:32','10:14','11:15','12:17','13:03','14:41','15:04','17:05']
h23 = pd.DataFrame({'Time':Hours23, 'Day':'2019-06-23'})
h23['timestamp'] = pd.to_datetime(h23['Day'] + " " + h23['Time'])
pd.merge_asof(h23,df, on='timestamp', direction='nearest').drop(columns=['Day','Time'])
Output
timestamp date hour person_1 person_2
0 2019-06-23 07:00:00 2019-06-23 07:00:00 0.124 0.431
1 2019-06-23 08:00:00 2019-06-23 07:09:00 0.164 0.441
2 2019-06-23 09:32:00 2019-06-23 07:09:00 0.164 0.441
3 2019-06-23 10:14:00 2019-06-23 07:09:00 0.164 0.441
4 2019-06-23 11:15:00 2019-06-23 07:09:00 0.164 0.441
5 2019-06-23 12:17:00 2019-06-23 07:09:00 0.164 0.441
6 2019-06-23 13:03:00 2019-06-23 07:09:00 0.164 0.441
7 2019-06-23 14:41:00 2019-06-23 07:09:00 0.164 0.441
8 2019-06-23 15:04:00 2019-06-23 07:09:00 0.164 0.441
9 2019-06-23 17:05:00 2019-06-23 07:09:00 0.164 0.441
Upvotes: 3