Reputation: 3
I have the dataframe below and I want another column that displays the column names of the values that are not NaN. Like this:
New Column
y32
y9
y11
y9
and so on.
y9 y2 y11 y32
1 NaN NaN NaN 0.201229
2 0.053303 NaN NaN NaN
4 NaN NaN 0.442415 NaN
6 0.000529 NaN NaN NaN
I tried to do it with for loops but couldn't quite get there. Help is very appreciated.
Upvotes: 0
Views: 1338
Reputation: 4771
Here are a couple approaches without using pandas.DataFrame.apply
:
pandas.DataFrame.idxmax
with axis = 1
(columns)>>> df["New Column"] = df.idxmax(1)
numpy.where
and pandas.DataFrame.notna
>>> df["New Column"] = df.columns[np.where(df.notna())[1]]
In both cases the resulting dataframe is:
y9 y2 y11 y32 New Column
1 NaN NaN NaN 0.201229 y32
2 0.053303 NaN NaN NaN y9
4 NaN NaN 0.442415 NaN y11
6 0.000529 NaN NaN NaN y9
Upvotes: 2
Reputation: 12140
You can use idxmax
with skipna=True
and apply it to each row:
def f(r):
return r.idxmax(skipna=True)
df['New Column'] = df.apply(f, axis=1)
Or a one-liner:
df['New Column'] = df.apply(lambda r: r.idxmax(skipna=True), axis=1)
UPD:
A better way would be to use Series.first_valid_index()
:
df['New Column'] = df.apply(pd.Series.first_valid_index, axis=1)
Upvotes: 0