qtscho
qtscho

Reputation: 25

Checking if date is between multiple date ranges

I have 2 dataframes:

df1 = pd.DataFrame({'name': ['Bob', 'Jenny', 'Larry', 'Sam', 'Ashley', 'John', 'Bob', 'Jenny', 'Larry', 'Sam', 'Ashley', 'John'], 'start_time': ['2020-06-01 08:01:45', '2020-06-01 08:55:47', '2020-06-01 09:07:00', '2020-06-01 09:12:00', '2020-06-01 09:12:40', '2020-06-01 09:22:05', '2020-06-02 08:01:45', '2020-06-02 08:55:47', '2020-06-02 09:07:00', '2020-06-02 09:12:00', '2020-06-02 09:12:40', '2020-06-02 09:22:05'], 'end_time':['2020-06-01 16:15:00', '2020-06-01 17:00:00', '2020-06-01 17:05:04', '2020-06-01 17:12:12', '2020-06-01 17:45:00', '2020-06-01 19:05:00', '2020-06-02 16:15:00', '2020-06-02 17:00:00', '2020-06-02 17:05:04', '2020-06-02 17:12:12', '2020-06-02 17:45:00', '2020-06-02 19:05:00']})
df2 = pd.DataFrame({'name': ['Bob', 'Bob', 'Jenny', 'Ashley', 'Bob', 'John', 'John', 'Jenny', 'Sam'], 'time': ['2020-06-01 08:07:00', '2020-06-01 08:40:47', '2020-06-01 8:50:00', '2020-06-01 10:10:30', '2020-06-01 14:12:33', '2020-06-01 14:41:33', '2020-06-01 15:12:33', '2020-06-01 18:44:33', '2020-06-01 22:12:07']})

I'm trying to check and see if a given 'time' in df2 is contained within any time span with the same name (i.e. I'm only trying to check Bob's entries in df2 against Bob's 'start_time's and 'end_time's in df1) and return a boolean value in a new column in df2. Here's what I'd expect as output:

df2 = pd.DataFrame({'name': ['Bob', 'Bob', 'Jenny', 'Ashley', 'Bob', 'John', 'John', 'Jenny', 'Sam'], 'time': ['2020-06-01 08:07:00', '2020-06-01 08:40:47', '2020-06-01 8:50:00', '2020-06-01 10:10:30', '2020-06-01 14:12:33', '2020-06-01 14:41:33', '2020-06-01 15:12:33', '2020-06-01 18:44:33', '2020-06-01 22:12:07'], 'legal': [True, True, False, True, True, True, True, False, False]})

I've tried using if functions, but I keep getting a ValueError: Can only compare identically-labeled Series objects. Any suggestions?

Upvotes: 2

Views: 255

Answers (2)

wwnde
wwnde

Reputation: 26676

res = df2.merge(df1, on='name', how='left').drop_duplicates( subset='time', keep='first').sort_values(by='name')
res['legal']=res['time'].between(res['start_time'],res['end_time'])
res.drop(columns=['start_time','end_time'], inplace=True)

print(res)

     name                 time  legal
6   Ashley  2020-06-01 10:10:30   True
0      Bob  2020-06-01 08:07:00   True
2      Bob  2020-06-01 08:40:47   True
8      Bob  2020-06-01 14:12:33   True
4    Jenny   2020-06-01 8:50:00  False
14   Jenny  2020-06-01 18:44:33  False
10    John  2020-06-01 14:41:33   True
12    John  2020-06-01 15:12:33   True
16     Sam  2020-06-01 22:12:07  False

Upvotes: 0

ScootCork
ScootCork

Reputation: 3686

You could do so by doing a left join, compare and then aggregate taking the max of 'legal':

df3 = df2.merge(df1, on='name', how='left')

df3['legal'] = (df3['time'] >= df3['start_time']) & (df3['time'] <= df3['end_time'])

print(df3[['name', 'time', 'legal']].groupby(['name', 'time']).max().reset_index())

prints

     name                 time  legal
0  Ashley  2020-06-01 10:10:30   True
1     Bob  2020-06-01 08:07:00   True
2     Bob  2020-06-01 08:40:47   True
3     Bob  2020-06-01 14:12:33   True
4   Jenny  2020-06-01 18:44:33  False
5   Jenny   2020-06-01 8:50:00  False
6    John  2020-06-01 14:41:33   True
7    John  2020-06-01 15:12:33   True
8     Sam  2020-06-01 22:12:07  False

Note: when comparing strings it is important to have consistent formatting of the datetimes.

Upvotes: 2

Related Questions