Reputation: 427
I have two columns in a dataframe and I need to create a new one based on them. For example:
df = pd.DataFrame(data={'a':[1.0,1.0,2.0], 'b':[3.0,3.0,3.0]})
df.iloc[1,0]=np.nan
a b
0 1.0 3.0
1 NaN 3.0
2 2.0 3.0
I need to add a column c which takes value from a when it is not null and otherwise from b. like:
a b c
0 1.0 3.0 1.0
1 NaN 3.0 3.0
2 2.0 3.0 2.0
Here is what I have tried:
def dist(df):
if df['a']:
return df.a
else:
return df.b
df['c']=df.apply(dist,axis=1)
but the result is not what I expected. Can anyone suggest what I should do? Thx!
Upvotes: 3
Views: 707
Reputation: 2153
>>> d['c'] = df.a.where(~np.isnan(df.a), df.b)
>>> df
a b c
0 1 3 1
1 NaN 3 3
2 2 3 2
It is tempting to write the more compact:
df['c'] = df.a.where(df.a, df.b)
but this won't do the right thing for df.a[k] == 0 (which is also interpreted as False).
Instead of isnan, you can use the property of NaN in that it is the only value not equal to itself, so the following also works:
df['c'] = df.a.where(df.a==df.a, df.b)
Upvotes: 1