Pandas, select a single column where a second column has a NaN value

Question

I have a data frame that looks like this:

    a   b   c
0   Alabama[edit]   NaN NaN
1   Auburn  (Auburn University)[1]
2   Florence    (University of
3   Jacksonville    (Jacksonville   State
4   Livingston  (University of

I'd like to add a column to the dataframe called 'State' that copies the value of column 'a' when column 'b' has a NaN value, otherwise it will just place a NaN value in the state column.

I have tried:

df['State'] = np.where(df['b'] == np.NaN, df['a'], np.NaN)
df['State'] = df.loc[df['b'] == np.NaN, 'a']

However for some reason both of these do not seem to evaluate the np.NaN. If i modify the criteria to == '(Auburn' then it finds the row and correctly copies the value of column 'a' into 'State'

If i use this function: df1 = df[df['b'].isnull()] then i get all the relevant rows but in a new dataframe which i was trying to avoid.

Any help much appreciated. Thanks JP

cs95 · Accepted Answer

Your mistake is in your belief that df['b'] == np.NaN selects NaNs... it does not, as this example shows:

In [1]: np.nan == np.nan
Out[1]: False

This is the mathematical definition of NaN. Since NaN != NaN, doing an equality comparison on NaN just won't cut it. Use isna or isnull or np.isnan, these functions are meant for this very purpose.

For example,

df['State'] = np.where(df['b'].isnull(), df['a'], np.NaN)

Or,

df['State'] = df.loc[df['b'].isnull(), 'a']

Pandas, select a single column where a second column has a NaN value

Answers (2)

Related Questions