Reputation: 111
I have a data frame:
Id modification_date date1 date2 estimate_date
aaa 2022-07-19T13:27:01Z 2022-07-21T20:01:00Z 2022-07-21T22:59:00Z
aaa 2022-07-20T13:49:21Z 2022-07-21T20:01:00Z 2022-07-21T22:59:00Z 2022-07-22T01:06:05Z
aaa 2022-07-21T14:24:20Z 2022-07-21T14:01:00Z 2022-07-21T22:59:00Z 2022-07-21T19:06:33Z
aaa 2022-07-21T18:52:23Z 2022-07-21T14:01:00Z 2022-07-21T22:59:00Z
bbb 2022-07-18T21:04:13Z 2022-07-20T13:01:00Z 2022-07-21T00:59:00Z
bbb 2022-07-19T20:30:20Z 2022-07-20T13:01:00Z 2022-07-21T00:59:00Z
bbb 2022-07-20T17:37:58Z 2022-07-20T13:01:00Z 2022-07-21T00:59:00Z
ddd 2022-07-11T14:40:09Z 2022-07-12T22:01:00Z 2022-07-13T00:59:00Z 2022-07-12T23:00:33Z
ddd 2022-07-11T17:29:11Z 2022-07-12T22:01:00Z 2022-07-13T00:59:00Z 2022-07-13T00:37:21Z
ddd 2022-07-13T20:36:11Z 2022-07-12T22:01:00Z 2022-07-13T00:59:00Z
eee 2022-07-14T14:44:08Z 2022-07-18T17:01:00Z 2022-07-18T19:59:00Z 2022-07-18T21:36:38Z
eee 2022-07-18T19:45:42Z 2022-07-21T17:01:00Z 2022-07-21T19:59:00Z 2022-07-21T17:01:00Z
eee 2022-07-19T20:46:39Z 2022-07-21T17:01:00Z 2022-07-21T19:59:00Z
eee 2022-07-21T13:35:07Z 2022-07-21T17:01:00Z 2022-07-21T19:59:00Z 2022-07-21T20:01:41Z
eee 2022-07-21T21:37:24Z 2022-07-21T17:01:00Z 2022-07-21T19:59:00Z
I want to keep the latest modified record, but also, I dont want to miss on the least estimated date for that day's date1 and date2.
What I am doing is first sorting on the basis of modification date on group of id, and then again grouping date1 and date2 together where the last modified record's date1 and date2 group is same. and sorting on that group to get minimum estimation date.
What I am expecting is:
Id modification_date date1 date2 new_estimate_date
aaa 2022-07-21T18:52:23Z 2022-07-21T14:01:00Z 2022-07-21T22:59:00Z 2022-07-21T19:06:33Z
bbb 2022-07-20T17:37:58Z 2022-07-20T13:01:00Z 2022-07-21T00:59:00Z
ddd 2022-07-13T20:36:11Z 2022-07-12T22:01:00Z 2022-07-13T00:59:00Z 2022-07-12T23:00:33Z
eee 2022-07-21T21:37:24Z 2022-07-21T17:01:00Z 2022-07-21T19:59:00Z 2022-07-21T17:01:00Z
I have tried:
def estimatedCT(df):
for i, j in df.sort_values('modification_date', ascending=False).groupby('id'):
# print(j.head(1))
for k, l in j.sort_values('estimate_date', ascending=True, na_position='last').groupby(['date1', 'date2']):
final_val = l['estimate_date'].head(1)
print(final_val)
if pd.notnull(final_val).all():
print(final_val)
return final_val
else:
print(np.nan)
return np.nan
df['new_estimate_date'] = df['estimate_date'].apply(lambda x: estimatedCT(x))
But it is somehow giving me values of the last group in every record. Please suggest
Upvotes: 1
Views: 104