Replace all value with latest value of certain status in Pandas

Question

The dataframe I had was as below, and I would like to replace all value in 'revenue' column for 'Terminated' status with the latest value when the status is 'Active' in client level.

    Month   Status     revenue   client
0   11/1/19 Terminated  56.18      A
1   10/1/19 Terminated  84.27      A
2   9/1/19  Terminated  112.36     A
3   8/1/19  Terminated  140.45     A
4   7/1/19  Terminated  140.45     A
5   6/1/19  Terminated  140.45     A
6   5/1/19  Active      140.45     A
7   4/1/19  Active       90.23     A

The expected output is as below:

    Month   Status     revenue   client
0   11/1/19 Terminated  140.45     A
1   10/1/19 Terminated  140.45     A
2   9/1/19  Terminated  140.45     A
3   8/1/19  Terminated  140.45     A
4   7/1/19  Terminated  140.45     A
5   6/1/19  Terminated  140.45     A
6   5/1/19  Active      140.45     A
7   4/1/19  Active       90.23     A

Quang Hoang · Accepted Answer

Here's my approach with map and np.where:

# this requies the data be sorted descendingly by time, as shown in your sample
maps = df[df['Status']=='Active'].groupby('client')['revenue'].first()

df['revenue'] = np.where(df['Status']=='Terminated',
                         df['client'].map(maps),
                         df['revenue'])

Output:

     Month      Status  revenue client
0  11/1/19  Terminated   140.45      A
1  10/1/19  Terminated   140.45      A
2   9/1/19  Terminated   140.45      A
3   8/1/19  Terminated   140.45      A
4   7/1/19  Terminated   140.45      A
5   6/1/19  Terminated   140.45      A
6   5/1/19      Active   140.45      A
7   4/1/19      Active    90.23      A

Option 2:

# sort by time and client:
df = df.sort_values(['client','time'], ascending=[True, False])

# backfill
df['revenue'] = df['revenue'].mask(df['Status']=='Terminated').bfill()

Replace all value with latest value of certain status in Pandas

Answers (2)

Related Questions