Bera
Bera

Reputation: 1949

Concatenate two columns

I have two text columns A and B. I want to take the first non empty string or if both A and B has values take the values from A. C is the column im trying to create:

import pandas as pd

cols = ['A','B']
data = [['data','data'],
        ['','data'],
        ['',''],
        ['data1','data2']]

df = pd.DataFrame.from_records(data=data, columns=cols)

      A      B
0  data   data
1         data
2             
3  data1  data2

My attempt:

df['C'] = df[cols].apply(lambda row: sorted([val if val else '' for val in row], reverse=True)[0], axis=1) #Reverse sort to avoid picking an empty string

      A      B      C
0  data   data      data
1         data      data
2             
3  data1  data2     data2 #I want data1 here

Expected output:

      A      B      C
0  data   data      data
1         data      data
2             
3  data1  data2     data1

I think I want the pandas equivalent of SQL coalesce.

Upvotes: 2

Views: 83

Answers (2)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Let's try idxmax + lookup:

df['C'] = df.lookup(df.index, df.ne('').idxmax(1))

Alternatively you can use Series.where:

df['C'] = df['A'].where(lambda x: x.ne(''), df['B'])

       A      B      C
0   data   data   data
1          data   data
2                     
3  data1  data2  data1

Upvotes: 3

Mayank Porwal
Mayank Porwal

Reputation: 34056

You can also use numpy.where:

In [1022]: import numpy as np
In [1023]: df['C'] = np.where(df['A'].eq(''), df['B'], df['A'])

In [1024]: df
Out[1024]: 
       A      B      C
0   data   data   data
1          data   data
2                     
3  data1  data2  data1

Upvotes: 4

Related Questions