Reputation: 3786
given this Dataframe :
import pandas as pd
import numpy as np
data = {'column1': [True,False, False, True, True],
'column2' : [np.nan,0.21, np.nan, 0.2222, np.nan],
'column3': [1000, 0, 0, 0, 0 ]}
df = pd.DataFrame.from_dict(data)
print(df)
column1 column2 column3
0 True NaN 1000
1 False 0.2100 0
2 False NaN 0
3 True 0.2222 0
4 True NaN 0
How can I multiply the result from column2 with the previous value of column3 when the column2 row isn't a NaN otherwise just return the previous value of column3 ?
The results should be something like this :
column1 column2 column3
0 True NaN 1000
1 False 0.2100 210
2 False NaN 210
3 True 0.2222 46.662
4 True NaN 46.662
I've been browsing through similar questions but I just can't get my head around it ..
I'd appreciate your input :)
Upvotes: 3
Views: 453
Reputation: 75080
You can give this a try:
#replace 0 with nan and create a copy of the df
m=df.assign(column3=df.column3.replace(0,np.nan))
#ffill on axis 1 where column2 is not null , and filter the last col then cumprod
final=(df.assign(column3=m.mask(m.column2.notna(),m.ffill(1)).iloc[:,-1].cumprod().ffill()))
column1 column2 column3
0 True NaN 1000.000
1 False 0.2100 210.000
2 False NaN 210.000
3 True 0.2222 46.662
4 True NaN 46.662
Upvotes: 2
Reputation: 4315
Ex.
import pandas as pd
import numpy as np
data = {'column1': [True,False, False, True, True],
'column2' : [np.nan,0.21, np.nan, 0.2222, np.nan],
'column3': [1000, 0, 0, 0, 0 ]}
df = pd.DataFrame.from_dict(data)
pre_idx = 0
for idx in df.index:
is_nan = pd.isnull(df['column2'][idx])
if idx != 0:
pre_idx = idx -1
df.at[idx, 'column3'] = df.at[pre_idx, 'column3'] * (1 if is_nan else df.at[idx, 'column2'])
print(df)
O/P:
column1 column2 column3
0 True NaN 1000
1 False 0.2100 210
2 False NaN 210
3 True 0.2222 46
4 True NaN 46
Upvotes: 1
Reputation: 2212
I would define a dummy class to accumulate the last value of column3 and then iterate over rows to do the computation. If you do it this way, you avoid writing a for loop and you concentrate the computation in a map call, which can be for example run in parallel easily
class AccumMult:
def __init__(self):
self.last_val = None
def mult(self, c2, c3):
self.last_val = c3 if self.last_val is None else self.last_val
if not np.isnan(c2):
self.last_val = self.last_val * c2
return self.last_val
m = AccumMult()
df["column3"] = list(map(lambda x: m.mult(x[0], x[1]), df[["column2", "column3"]].values.tolist()))
Upvotes: 1
Reputation: 2576
As the value in row x depends on information in all rows before it, I guess you have no choice and need to iterate over the rows. You could do
prev = df.at[0, 'column3']
for e, row in df.iterrows():
prev = df.at[e, 'column3'] = prev * (1 if np.isnan(row.column2) else row.column2)
Upvotes: 0