Reputation: 689
I have two dfs as below. I want to drop all rows for a userid from PersonDate_List which has datetime less than the min('datetime') for the same userid in df_userid_date.
PersonDate_List (cols={'userid','datetime'})
userid datetime Score
AB-4243 2/1/2016 0
AB-4243 2/2/2016 0
AB-4243 2/3/2016 0
AB-4243 2/4/2016 0
AB-4243 2/5/2016 0
AB-4243 2/6/2016 76
AB-4243 2/7/2016 84
AB-4243 2/8/2016 84
AB-4243 2/9/2016 81
AB-4243 2/10/2016 79
NP-7585 2/1/2016 22
NP-7585 2/2/2016 23.5
NP-7585 2/3/2016 30.15
NP-7585 2/4/2016 30.15
NP-7585 2/5/2016 30.15
NP-7585 2/6/2016 30.15
NP-7585 2/7/2016 0
NP-7585 2/8/2016 0
NP-7585 2/9/2016 22.5
NP-7585 2/10/2016 45.67
VX-4376 2/1/2016 0
VX-4376 2/2/2016 0
VX-4376 2/3/2016 0
VX-4376 2/4/2016 0
VX-4376 2/5/2016 0
VX-4376 2/6/2016 0
VX-4376 2/7/2016 0
VX-4376 2/8/2016 0
VX-4376 2/9/2016 0
VX-4376 2/10/2016 33.13
df_userid_date (cols={'userid','datetime'})
userid datetime
AB-4243 2/6/2016
AB-4243 2/7/2016
AB-4243 2/9/2016
AB-4243 2/10/2016
NP-7585 2/1/2016
NP-7585 2/2/2016
NP-7585 2/3/2016
NP-7585 2/7/2016
NP-7585 2/8/2016
NP-7585 2/9/2016
NP-7585 2/10/2016
VX-4376 2/10/2016
I am looking for results like below;
userid datetime Score
AB-4243 2/6/2016 76
AB-4243 2/7/2016 84
AB-4243 2/8/2016 84
AB-4243 2/9/2016 81
AB-4243 2/10/2016 79
NP-7585 2/1/2016 22
NP-7585 2/2/2016 23.5
NP-7585 2/3/2016 30.15
NP-7585 2/4/2016 30.15
NP-7585 2/5/2016 30.15
NP-7585 2/6/2016 30.15
NP-7585 2/7/2016 0
NP-7585 2/8/2016 0
NP-7585 2/9/2016 22.5
NP-7585 2/10/2016 45.67
VX-4376 2/10/2016 33.13
I tried adding a min date flag to df_userid_date and then merging it but I am not able to get the condition right here.
Upvotes: 0
Views: 82
Reputation: 6556
Try this way:
import numpy as np
df1 = pd.read_csv('PersonDate.csv')
df2 = pd.read_csv('useriddate.csv')
df1['datetime'] = pd.to_datetime(df1['datetime'])
df2['datetime'] = pd.to_datetime(df2['datetime'])
df3 = df1.merge(df2.groupby('userid',as_index=False).agg({'datetime' : np.min}), on='userid')
df3[df3["datetime_x"]>=df3["datetime_y"]]
Output:
userid datetime_x Score datetime_y
5 AB-4243 2016-02-06 76.00 2016-02-06
6 AB-4243 2016-02-07 84.00 2016-02-06
7 AB-4243 2016-02-08 84.00 2016-02-06
8 AB-4243 2016-02-09 81.00 2016-02-06
9 AB-4243 2016-02-10 79.00 2016-02-06
10 NP-7585 2016-02-01 22.00 2016-02-01
11 NP-7585 2016-02-02 23.50 2016-02-01
12 NP-7585 2016-02-03 30.15 2016-02-01
13 NP-7585 2016-02-04 30.15 2016-02-01
14 NP-7585 2016-02-05 30.15 2016-02-01
15 NP-7585 2016-02-06 30.15 2016-02-01
16 NP-7585 2016-02-07 0.00 2016-02-01
17 NP-7585 2016-02-08 0.00 2016-02-01
18 NP-7585 2016-02-09 22.50 2016-02-01
19 NP-7585 2016-02-10 45.67 2016-02-01
29 VX-4376 2016-02-10 33.13 2016-02-10
Upvotes: 2
Reputation: 896
I'm pretty sure there will be a more comprehensive way which result in shorter code. But if no more answer coming, you could use this:
import pandas as pd
import datetime
#Read data
df1 = pd.read_csv('df1.csv')
df2 = pd.read_csv('df2.csv')
#Format Datetime column
df1['datetime'] = df1['datetime'].apply(lambda x: datetime.datetime.strptime(x, '%m/%d/%Y'))
df2['datetime'] = df2['datetime'].apply(lambda x: datetime.datetime.strptime(x, '%m/%d/%Y'))
#Get min datetime in df2 for each id
min(list(df2[df2['userid']=='AB-4243']['datetime']))
temp = pd.DataFrame(list(set(df2['userid'])))
temp.columns = ['userid']
temp['min_datetime'] = temp['userid'].apply(lambda x: min(list(df2[df2['userid']==x]['datetime'])))
temp
#Merge in
df1 = df1.merge(temp, on='userid')
#Slicing
result = df1[df1['datetime'] >= df1['min_datetime']]
Result would be:
df1[df1['datetime'] >= df1['min_datetime']]
Out[65]:
userid datetime Score min_datetime
5 AB-4243 2016-02-06 76.00 2016-02-06
6 AB-4243 2016-02-07 84.00 2016-02-06
7 AB-4243 2016-02-08 84.00 2016-02-06
8 AB-4243 2016-02-09 81.00 2016-02-06
9 AB-4243 2016-02-10 79.00 2016-02-06
10 NP-7585 2016-02-01 22.00 2016-02-01
11 NP-7585 2016-02-02 23.50 2016-02-01
12 NP-7585 2016-02-03 30.15 2016-02-01
13 NP-7585 2016-02-04 30.15 2016-02-01
14 NP-7585 2016-02-05 30.15 2016-02-01
15 NP-7585 2016-02-06 30.15 2016-02-01
16 NP-7585 2016-02-07 0.00 2016-02-01
17 NP-7585 2016-02-08 0.00 2016-02-01
18 NP-7585 2016-02-09 22.50 2016-02-01
19 NP-7585 2016-02-10 45.67 2016-02-01
29 VX-4376 2016-02-10 33.13 2016-02-10
Upvotes: 0