Python pandas replacing changing cell value by condition from other column

Question

I have a dataframe:

df = pd.DataFrame({'cust': {0: 'A',
  1: 'A',
  2: 'A',
  3: 'A',
  4: 'B',
  5: 'B',
  6: 'B',
  7: 'B',
  8: 'B'},
 'value': {0: 6, 1: 10, 2: 11, 3: 15, 4: 6, 5: 12, 6: 21, 7: 29, 8: 33},
 'signal': {0: 0, 1: 1, 2: 1, 3: 0, 4: 1, 5: 0, 6: 0, 7: 0, 8: 0}})


  cust  value  signal
0    A      6       0
1    A     10       1
2    A     11       1
3    A     15       0
4    B      6       1
5    B     12       0
6    B     21       0
7    B     29       0
8    B     33       0

When signal != 0 I should replace "value" with its previous value for each customer. For example, in row with index 1, value = 10 should be replaced with its previous value, which is 6. In row with index 4, I cannot replace value 6 with its previous value, because there is no previous value for customer "B". In this case I should replace value with 0.

How to do that in the most performant way because I have a dataframe with 50 million rows?

Dani Mesejo · Accepted Answer

IIUC, you could use groupby (on cust column) transform using shift, fillna and then select the appropriate value with np.where:

# find previous value from each customer group
res = df.groupby('cust')['value'].transform('shift').fillna(0)

# replace values
df['value'] = np.where(df['signal'].ne(0), res, df['value'])

print(df)

Output

  cust  value  signal
0    A    6.0       0
1    A    6.0       1
2    A   10.0       1
3    A   15.0       0
4    B    0.0       1
5    B   12.0       0
6    B   21.0       0
7    B   29.0       0
8    B   33.0       0

UPDATE

If you need to propagate the last valid value, do the following:

# make invalid values na, to use ffill
df['value'] = np.where(df['signal'].ne(0), np.nan, df['value'])

# use ffill, for values at the beginning of the group fillna(0)
df['value'] = df.groupby('cust')['value'].transform('ffill').fillna(0)

print(df)

Output

  cust  value  signal
0    A    6.0       0
1    A    6.0       1
2    A    6.0       1
3    A   15.0       0
4    B    0.0       1
5    B   12.0       0
6    B   21.0       0
7    B   29.0       0
8    B   33.0       0

Python pandas replacing changing cell value by condition from other column

Answers (1)

Related Questions