Reputation: 1222

Create a new column using the non-empty value from each row

There is a Pandas DataFrame with 4 columns:

     col1    col2   col3    col4
0  orange     NaN    NaN     NaN
1     NaN  tomato    NaN     NaN
2     NaN     NaN  apple     NaN
3     NaN     NaN    NaN  carrot
4     NaN  potato    NaN     NaN

Each row contains only one string value, which may be present in any column. Other columns in that row are NaN. I want to create one column, which contains string values:

      col5 
0   orange
1   tomato
2    apple
3   carrot
4   potato

The most obvious approach looks like:

data['col5'] = data.col1.astype(str) + data.col2.astype(str)...

and remove "NaN" from output strings, but it's messy and will certainly result in errors.

Does Pandas offer any simple way of doing this?

Upvotes: 2

Answers (3)

DSM

Reputation: 353499

Another way (assuming that each column contains one string value and the remainder are NaN, not "NaN") would be to fillna and then use max:

>>> df.fillna('').max(axis=1)
0    orange
1    tomato
2     apple
3    carrot
4    potato
dtype: object

Upvotes: 2

Andy Hayden

Reputation: 375865

Here's one way, with apply and first_valid_index:

In [11]: df.apply(lambda x: x[x.first_valid_index()], axis=1)
Out[11]:
0    orange
1    tomato
2     apple
3    carrot
4    potato
dtype: object

To get these efficiently you could drop to numpy:

In [21]: df.values.ravel()[np.arange(0, len(df.index) * len(df.columns), len(df.columns)) + np.argmax(df.notnull().values, axis=1)]
Out[21]: array(['orange', 'tomato', 'apple', 'carrot', 'potato'], dtype=object)

Note: both will fail if you have rows of all NaN, you should filter these out (e.g. with dropna).

Upvotes: 2

user2742051

Reputation: 1

Mapping a filter function across rows elements should do it.

data['new_col'] = list(data.apply(lambda row: filter(lambda elem: not pd.isnull(elem), row)[0]))

Upvotes: 0

Create a new column using the non-empty value from each row

Answers (3)

Related Questions