Santosh M.
Santosh M.

Reputation: 2454

Join multiple columns into one column

Lets say I have pandas data frame like this:

import numpy as np
import pandas as pd
my_df = pd.DataFrame(columns=['column 1', 'column 2', 'column 3'], data = [['17.1 g', np.nan, np.nan], [np.nan, '7.0 g', np.nan], [np.nan, '3.7 g', '0.7 g'], [np.nan, np.nan, '1.7 g'], ['1.1 g', np.nan, '1.0 g']])

What I would like to have is:

    column1 column2 column3  column_new
0    17.1 g    NaN    NaN     17.1 g
1     NaN      7.0 g  NaN     7.0 g
2     NaN      3.7 g  0.7 g   3.7 g
3     NaN      NaN    1.7 g   1.7 g
4     1.1 g    NaN    1.0 g   1.1 g

I tried using apply and join as below but not getting the output as I wish:

my_df['column_new'] = my_df[my_df.columns[0:]].apply(lambda x: ' '.join(x.dropna().astype(str)), axis=1)
my_df

    column 1    column 2    column 3    column_new
0   17.1 g       NaN         NaN        17.1 g
1   NaN          7.0 g       NaN        7.0 g
2   NaN          3.7 g       0.7 g      3.7 g 0.7 g
2   NaN          NaN         1.7 g      1.7 g
3   1.1 g        NaN         1.0 g      1.1 g 1.0 g

How can I get output that I want by modifying above code. Thank in advance.

Edit: I am looking solution using above code to get the expected output. Although codes in answers using bfill and row-wise max are nice, they don't work well on the large data that I have.

Upvotes: 0

Views: 72

Answers (3)

manwithfewneeds
manwithfewneeds

Reputation: 1167

df = df.assign(new_col=df.replace('g', '', regex=True).astype(float).max(1).astype(str) + ' g'))

Upvotes: 0

Ben.T
Ben.T

Reputation: 29635

IIUC, you want the max per row, and even if they are not strictly number, it seems to work well at least on the data provided:

my_df['column_new'] = my_df.max(axis=1)
print (my_df)

  column 1 column 2 column 3 column_new
0   17.1 g      NaN      NaN     17.1 g
1      NaN    7.0 g      NaN      7.0 g
2      NaN    3.7 g    0.7 g      3.7 g
3      NaN      NaN    1.7 g      1.7 g
4    1.1 g      NaN    1.0 g      1.1 g

Upvotes: 1

BENY
BENY

Reputation: 323326

We can using bfill

df['New']=df.bfill(axis=1).iloc[:,0]
df
  column 1 column 2 column 3     New
0   17.1 g      NaN      NaN  17.1 g
1      NaN    7.0 g      NaN   7.0 g
2      NaN    3.7 g    0.7 g   3.7 g
3      NaN      NaN    1.7 g   1.7 g
4    1.1 g      NaN    1.0 g   1.1 g

Upvotes: 3

Related Questions