Reputation: 115
I have a data frame that looks like this:
a b c
0 Alabama[edit] NaN NaN
1 Auburn (Auburn University)[1]
2 Florence (University of
3 Jacksonville (Jacksonville State
4 Livingston (University of
I'd like to add a column to the dataframe called 'State' that copies the value of column 'a' when column 'b' has a NaN value, otherwise it will just place a NaN value in the state column.
I have tried:
df['State'] = np.where(df['b'] == np.NaN, df['a'], np.NaN)
df['State'] = df.loc[df['b'] == np.NaN, 'a']
However for some reason both of these do not seem to evaluate the np.NaN. If i modify the criteria to == '(Auburn' then it finds the row and correctly copies the value of column 'a' into 'State'
If i use this function:
df1 = df[df['b'].isnull()]
then i get all the relevant rows but in a new dataframe which i was trying to avoid.
Any help much appreciated. Thanks JP
Upvotes: 3
Views: 103
Reputation: 323286
You can using isnull
with where
,Since NaN has the property that it doesn't equal itself
df.a.where(df['b'].isnull())
Out[112]:
0 Alabama[edit]
1 NaN
2 NaN
3 NaN
4 NaN
Name: a, dtype: object
Upvotes: 1
Reputation: 402553
Your mistake is in your belief that df['b'] == np.NaN
selects NaNs... it does not, as this example shows:
In [1]: np.nan == np.nan
Out[1]: False
This is the mathematical definition of NaN. Since NaN != NaN, doing an equality comparison on NaN just won't cut it. Use isna
or isnull
or np.isnan
, these functions are meant for this very purpose.
For example,
df['State'] = np.where(df['b'].isnull(), df['a'], np.NaN)
Or,
df['State'] = df.loc[df['b'].isnull(), 'a']
Upvotes: 4