Reputation: 1383
I have a dataframe:
df = pd.DataFrame({'cust': {0: 'A',
1: 'A',
2: 'A',
3: 'A',
4: 'B',
5: 'B',
6: 'B',
7: 'B',
8: 'B'},
'value': {0: 6, 1: 10, 2: 11, 3: 15, 4: 6, 5: 12, 6: 21, 7: 29, 8: 33},
'signal': {0: 0, 1: 1, 2: 1, 3: 0, 4: 1, 5: 0, 6: 0, 7: 0, 8: 0}})
cust value signal
0 A 6 0
1 A 10 1
2 A 11 1
3 A 15 0
4 B 6 1
5 B 12 0
6 B 21 0
7 B 29 0
8 B 33 0
When signal != 0 I should replace "value" with its previous value for each customer. For example, in row with index 1, value = 10 should be replaced with its previous value, which is 6. In row with index 4, I cannot replace value 6 with its previous value, because there is no previous value for customer "B". In this case I should replace value with 0.
How to do that in the most performant way because I have a dataframe with 50 million rows?
Upvotes: 1
Views: 80
Reputation: 61930
IIUC, you could use groupby (on cust column) transform using shift, fillna and then select the appropriate value with np.where:
# find previous value from each customer group
res = df.groupby('cust')['value'].transform('shift').fillna(0)
# replace values
df['value'] = np.where(df['signal'].ne(0), res, df['value'])
print(df)
Output
cust value signal
0 A 6.0 0
1 A 6.0 1
2 A 10.0 1
3 A 15.0 0
4 B 0.0 1
5 B 12.0 0
6 B 21.0 0
7 B 29.0 0
8 B 33.0 0
UPDATE
If you need to propagate the last valid value, do the following:
# make invalid values na, to use ffill
df['value'] = np.where(df['signal'].ne(0), np.nan, df['value'])
# use ffill, for values at the beginning of the group fillna(0)
df['value'] = df.groupby('cust')['value'].transform('ffill').fillna(0)
print(df)
Output
cust value signal
0 A 6.0 0
1 A 6.0 1
2 A 6.0 1
3 A 15.0 0
4 B 0.0 1
5 B 12.0 0
6 B 21.0 0
7 B 29.0 0
8 B 33.0 0
Upvotes: 4