Jenny Jing Yu
Jenny Jing Yu

Reputation: 195

Replace all value with latest value of certain status in Pandas

The dataframe I had was as below, and I would like to replace all value in 'revenue' column for 'Terminated' status with the latest value when the status is 'Active' in client level.

    Month   Status     revenue   client
0   11/1/19 Terminated  56.18      A
1   10/1/19 Terminated  84.27      A
2   9/1/19  Terminated  112.36     A
3   8/1/19  Terminated  140.45     A
4   7/1/19  Terminated  140.45     A
5   6/1/19  Terminated  140.45     A
6   5/1/19  Active      140.45     A
7   4/1/19  Active       90.23     A

The expected output is as below:

    Month   Status     revenue   client
0   11/1/19 Terminated  140.45     A
1   10/1/19 Terminated  140.45     A
2   9/1/19  Terminated  140.45     A
3   8/1/19  Terminated  140.45     A
4   7/1/19  Terminated  140.45     A
5   6/1/19  Terminated  140.45     A
6   5/1/19  Active      140.45     A
7   4/1/19  Active       90.23     A

Upvotes: 0

Views: 33

Answers (2)

marwan
marwan

Reputation: 494

df.loc[df['Status'] == 'Terminated', 'revenue'] = np.nan
df = df.groupby(['client']).bfill()

The following code is all you need if your dataframe is sorted as shown

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150785

Here's my approach with map and np.where:

# this requies the data be sorted descendingly by time, as shown in your sample
maps = df[df['Status']=='Active'].groupby('client')['revenue'].first()

df['revenue'] = np.where(df['Status']=='Terminated',
                         df['client'].map(maps),
                         df['revenue'])

Output:

     Month      Status  revenue client
0  11/1/19  Terminated   140.45      A
1  10/1/19  Terminated   140.45      A
2   9/1/19  Terminated   140.45      A
3   8/1/19  Terminated   140.45      A
4   7/1/19  Terminated   140.45      A
5   6/1/19  Terminated   140.45      A
6   5/1/19      Active   140.45      A
7   4/1/19      Active    90.23      A

Option 2:

# sort by time and client:
df = df.sort_values(['client','time'], ascending=[True, False])

# backfill
df['revenue'] = df['revenue'].mask(df['Status']=='Terminated').bfill()

Upvotes: 1

Related Questions