Reputation: 1222
There is a Pandas DataFrame with 4 columns:
col1 col2 col3 col4
0 orange NaN NaN NaN
1 NaN tomato NaN NaN
2 NaN NaN apple NaN
3 NaN NaN NaN carrot
4 NaN potato NaN NaN
Each row contains only one string value, which may be present in any column. Other columns in that row are NaN. I want to create one column, which contains string values:
col5
0 orange
1 tomato
2 apple
3 carrot
4 potato
The most obvious approach looks like:
data['col5'] = data.col1.astype(str) + data.col2.astype(str)...
and remove "NaN" from output strings, but it's messy and will certainly result in errors.
Does Pandas offer any simple way of doing this?
Upvotes: 2
Views: 1964
Reputation: 353009
Another way (assuming that each column contains one string value and the remainder are NaN
, not "NaN"
) would be to fillna
and then use max
:
>>> df.fillna('').max(axis=1)
0 orange
1 tomato
2 apple
3 carrot
4 potato
dtype: object
Upvotes: 2
Reputation: 375415
Here's one way, with apply
and first_valid_index
:
In [11]: df.apply(lambda x: x[x.first_valid_index()], axis=1)
Out[11]:
0 orange
1 tomato
2 apple
3 carrot
4 potato
dtype: object
To get these efficiently you could drop to numpy:
In [21]: df.values.ravel()[np.arange(0, len(df.index) * len(df.columns), len(df.columns)) + np.argmax(df.notnull().values, axis=1)]
Out[21]: array(['orange', 'tomato', 'apple', 'carrot', 'potato'], dtype=object)
Note: both will fail if you have rows of all NaN, you should filter these out (e.g. with dropna
).
Upvotes: 2
Reputation: 1
Mapping a filter function across rows elements should do it.
data['new_col'] = list(data.apply(lambda row: filter(lambda elem: not pd.isnull(elem), row)[0]))
Upvotes: 0