Reputation: 909
I have a dataframe like the following:
import pandas as pd
data={'col1':[1,3,3,1,2,3,2,2, 1], 'col2':[np.nan, 1, np.nan, 1, np.nan, np.nan, np.nan, 2, np.nan]}
df=pd.DataFrame(data,columns=['col1', 'col2'])
print df
col1 col2
0 1 NaN
1 3 1.0
2 3 NaN
3 1 1.0
4 2 NaN
5 3 NaN
6 2 NaN
7 2 2.0
8 1 NaN
I am trying to make a third column that fills in the NaN vales in col2
if the value of col2
is equal to 1.0
or the row above in col2
is 1.0
. The final dataframe would look like this:
col1 col2 col3
0 1 NaN NaN
1 3 1.0 1.0
2 3 NaN 1.0
3 1 1.0 1.0
4 2 NaN 1.0
5 3 NaN 1.0
6 2 NaN 1.0
7 2 2.0 2.0
8 1 NaN NaN
First approach I tried was:
df['col3'] = ((df['col2']== 1) | ((df['col2'].shift()== 1))).astype('int')
This leaves me with this dataframe:
col1 col2 col3
0 1 NaN 0
1 3 1.0 1
2 3 NaN 1
3 1 1.0 1
4 2 NaN 1
5 3 NaN 0
6 2 NaN 0
7 2 2.0 0
8 1 NaN 0
Which corrects the first instance of a missing value, but does not continue to fill missing values. I also tried using the np.where()
function and I get the same results.
Is there a way to write this in pandas where it fixes multiple instances in a row?
Upvotes: 4
Views: 3938
Reputation: 33843
You can use np.where
by looking at where the forward-fill is equal to one, filling 1 where it's True, and falling back to the value of 'col2' when it's False:
df['col2'] = np.where(df['col2'].ffill() == 1, 1, df['col2'])
The resulting output:
col1 col2
0 1 NaN
1 3 1.0
2 3 1.0
3 1 1.0
4 2 1.0
5 3 1.0
6 2 1.0
7 2 2.0
8 1 NaN
Upvotes: 6
Reputation: 294506
ffilled = df.col2.ffill()
df.assign(col3=df.col2.fillna(ffilled[ffilled == 1]))
Upvotes: 2
Reputation: 533
You can use the df.fillna function with forward padding like this:
df.fillna(method='pad')
col1 col2
0 1 NaN
1 3 1.0
2 3 1.0
3 1 1.0
4 2 1.0
5 3 1.0
6 2 1.0
7 2 2.0
8 1 2.0
Upvotes: 3