dingaro
dingaro

Reputation: 2342

Value calculation based on date in DataFrame in Python Pandas?

I have DataFrame with clients' agreements like below:

rng = pd.date_range('2020-12-01', periods=5, freq='D')
df = pd.DataFrame({ "ID" : ["1", "2", "1", "2", "2"],
                   "value" : [100, 200, 300, 400, 500],
                   "status" : ["active", "finished", "active", "finished", "active"],
                   "Date": rng})

And I need to create new DataFrame with calculation based on above df:

  1. New1 = Value of the last agreement with status 'active'
  2. New2 = Value of the last agreement with status 'finished'

To be more precision I need to create df like below:

enter image description here

Upvotes: 2

Views: 100

Answers (2)

jezrael
jezrael

Reputation: 862581

Use DataFrame.sort_values by both columns with DataFrame.pivot_table with aggregate function last:

df = (df.sort_values(['ID','Date'])
        .pivot_table(index='ID', columns='status', values='value', aggfunc='last')
        .rename(columns={'active':'New1','finished':'New2'})[['New1','New2']]
        .reset_index()
        .rename_axis(None,axis=1)
       )
print (df)
  ID   New1   New2
0  1  300.0    NaN
1  2  500.0  400.0

If datetimes are sorted per groups solution is simplier:

df = (df.pivot_table(index='ID', columns='status', values='value', aggfunc='last')
        .rename(columns={'active':'New1','finished':'New2'})[['New1','New2']]
        .reset_index()
        .rename_axis(None,axis=1)
       )
print (df)

Upvotes: 2

U13-Forward
U13-Forward

Reputation: 71570

Try using this long:

df1 = df.loc[df['status'] == "active"]
df2 = df.loc[df['status'] == "finished"]
df1 = df1.groupby("ID")['value'].last()
df2 = df2.groupby("ID")['value'].last()
IDs = df["ID"].drop_duplicates()
new_df = pd.DataFrame({"ID": IDs, "New1": df1.reindex(IDs).tolist(), "New2": df2.reindex(IDs).tolist()})
print(new_df)

Output:

  ID  New1   New2
0  1   300    NaN
1  2   500  400.0

Upvotes: 4

Related Questions