Reputation: 25
I have 2 dataframes:
df1 = pd.DataFrame({'name': ['Bob', 'Jenny', 'Larry', 'Sam', 'Ashley', 'John', 'Bob', 'Jenny', 'Larry', 'Sam', 'Ashley', 'John'], 'start_time': ['2020-06-01 08:01:45', '2020-06-01 08:55:47', '2020-06-01 09:07:00', '2020-06-01 09:12:00', '2020-06-01 09:12:40', '2020-06-01 09:22:05', '2020-06-02 08:01:45', '2020-06-02 08:55:47', '2020-06-02 09:07:00', '2020-06-02 09:12:00', '2020-06-02 09:12:40', '2020-06-02 09:22:05'], 'end_time':['2020-06-01 16:15:00', '2020-06-01 17:00:00', '2020-06-01 17:05:04', '2020-06-01 17:12:12', '2020-06-01 17:45:00', '2020-06-01 19:05:00', '2020-06-02 16:15:00', '2020-06-02 17:00:00', '2020-06-02 17:05:04', '2020-06-02 17:12:12', '2020-06-02 17:45:00', '2020-06-02 19:05:00']})
df2 = pd.DataFrame({'name': ['Bob', 'Bob', 'Jenny', 'Ashley', 'Bob', 'John', 'John', 'Jenny', 'Sam'], 'time': ['2020-06-01 08:07:00', '2020-06-01 08:40:47', '2020-06-01 8:50:00', '2020-06-01 10:10:30', '2020-06-01 14:12:33', '2020-06-01 14:41:33', '2020-06-01 15:12:33', '2020-06-01 18:44:33', '2020-06-01 22:12:07']})
I'm trying to check and see if a given 'time'
in df2
is contained within any time span with the same name (i.e. I'm only trying to check Bob's entries in df2
against Bob's 'start_time'
s and 'end_time'
s in df1
) and return a boolean value in a new column in df2
. Here's what I'd expect as output:
df2 = pd.DataFrame({'name': ['Bob', 'Bob', 'Jenny', 'Ashley', 'Bob', 'John', 'John', 'Jenny', 'Sam'], 'time': ['2020-06-01 08:07:00', '2020-06-01 08:40:47', '2020-06-01 8:50:00', '2020-06-01 10:10:30', '2020-06-01 14:12:33', '2020-06-01 14:41:33', '2020-06-01 15:12:33', '2020-06-01 18:44:33', '2020-06-01 22:12:07'], 'legal': [True, True, False, True, True, True, True, False, False]})
I've tried using if functions, but I keep getting a ValueError: Can only compare identically-labeled Series objects. Any suggestions?
Upvotes: 2
Views: 255
Reputation: 26676
res = df2.merge(df1, on='name', how='left').drop_duplicates( subset='time', keep='first').sort_values(by='name')
res['legal']=res['time'].between(res['start_time'],res['end_time'])
res.drop(columns=['start_time','end_time'], inplace=True)
print(res)
name time legal
6 Ashley 2020-06-01 10:10:30 True
0 Bob 2020-06-01 08:07:00 True
2 Bob 2020-06-01 08:40:47 True
8 Bob 2020-06-01 14:12:33 True
4 Jenny 2020-06-01 8:50:00 False
14 Jenny 2020-06-01 18:44:33 False
10 John 2020-06-01 14:41:33 True
12 John 2020-06-01 15:12:33 True
16 Sam 2020-06-01 22:12:07 False
Upvotes: 0
Reputation: 3686
You could do so by doing a left join, compare and then aggregate taking the max of 'legal':
df3 = df2.merge(df1, on='name', how='left')
df3['legal'] = (df3['time'] >= df3['start_time']) & (df3['time'] <= df3['end_time'])
print(df3[['name', 'time', 'legal']].groupby(['name', 'time']).max().reset_index())
prints
name time legal
0 Ashley 2020-06-01 10:10:30 True
1 Bob 2020-06-01 08:07:00 True
2 Bob 2020-06-01 08:40:47 True
3 Bob 2020-06-01 14:12:33 True
4 Jenny 2020-06-01 18:44:33 False
5 Jenny 2020-06-01 8:50:00 False
6 John 2020-06-01 14:41:33 True
7 John 2020-06-01 15:12:33 True
8 Sam 2020-06-01 22:12:07 False
Note: when comparing strings it is important to have consistent formatting of the datetimes.
Upvotes: 2