Reputation: 195
The dataframe I had was as below, and I would like to replace all value in 'revenue' column for 'Terminated' status with the latest value when the status is 'Active' in client level.
Month Status revenue client
0 11/1/19 Terminated 56.18 A
1 10/1/19 Terminated 84.27 A
2 9/1/19 Terminated 112.36 A
3 8/1/19 Terminated 140.45 A
4 7/1/19 Terminated 140.45 A
5 6/1/19 Terminated 140.45 A
6 5/1/19 Active 140.45 A
7 4/1/19 Active 90.23 A
The expected output is as below:
Month Status revenue client
0 11/1/19 Terminated 140.45 A
1 10/1/19 Terminated 140.45 A
2 9/1/19 Terminated 140.45 A
3 8/1/19 Terminated 140.45 A
4 7/1/19 Terminated 140.45 A
5 6/1/19 Terminated 140.45 A
6 5/1/19 Active 140.45 A
7 4/1/19 Active 90.23 A
Upvotes: 0
Views: 33
Reputation: 494
df.loc[df['Status'] == 'Terminated', 'revenue'] = np.nan
df = df.groupby(['client']).bfill()
The following code is all you need if your dataframe is sorted as shown
Upvotes: 0
Reputation: 150785
Here's my approach with map
and np.where
:
# this requies the data be sorted descendingly by time, as shown in your sample
maps = df[df['Status']=='Active'].groupby('client')['revenue'].first()
df['revenue'] = np.where(df['Status']=='Terminated',
df['client'].map(maps),
df['revenue'])
Output:
Month Status revenue client
0 11/1/19 Terminated 140.45 A
1 10/1/19 Terminated 140.45 A
2 9/1/19 Terminated 140.45 A
3 8/1/19 Terminated 140.45 A
4 7/1/19 Terminated 140.45 A
5 6/1/19 Terminated 140.45 A
6 5/1/19 Active 140.45 A
7 4/1/19 Active 90.23 A
Option 2:
# sort by time and client:
df = df.sort_values(['client','time'], ascending=[True, False])
# backfill
df['revenue'] = df['revenue'].mask(df['Status']=='Terminated').bfill()
Upvotes: 1