Michael
Michael

Reputation: 2566

How to concatenate pandas DataFrame with built-in logic?

I have two pandas data frame and I would like to produce the output shown in the expected data frame.

import pandas as pd

df1 = pd.DataFrame({'a':['aaa', 'bbb', 'ccc', 'ddd'],
                    'b':['eee', 'fff', 'ggg', 'hhh']})
df2 = pd.DataFrame({'a':['aaa', 'bbb', 'ccc', 'ddd'],
                    'b':['eee', 'fff', 'ggg', 'hhh'],
                    'update': ['', 'X', '', 'Y']})
expected = pd.DataFrame({'a': ['aaa', 'bbb', 'ccc', 'ddd'],
                         'b': ['eee', 'X', 'ggg', 'Y']})

I tried to apply some concatenation logic but this is not producing the expected output.

df1.set_index('b')
df2.set_index('update')
out = pd.concat([df1[~df1.index.isin(df2.index)], df2])

print(out)
         a    b   update
0  aaa  eee
1  bbb  fff  X
2  ccc  ggg
3  ddd  hhh  Y

From this output I can produce the expected output but I was wondering if this logic can be built directly inside the concat call?

def fx(row):
    if row['update'] is not '':
        row['b'] = row['update']
    return row

result = out.apply(lambda x : fx(x),axis=1)
result.drop('update', axis=1, inplace=True)
print(result)
     a        b
0  aaa      eee
1  bbb      X
2  ccc      ggg
3  ddd      Y

Upvotes: 3

Views: 85

Answers (3)

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

Use builtin update by replacing '' with nan i.e

df1['b'].update(df2['update'].replace('',np.nan))

    a    b
0  aaa  eee
1  bbb    X
2  ccc  ggg
3  ddd    Y

You can also use np.where i.e

out = df1.assign(b=np.where(df2['update'].eq(''), df2['b'], df2['update']))

Upvotes: 5

Vivek Harikrishnan
Vivek Harikrishnan

Reputation: 866

How about with mask

df1['b'].update(df2.mask(df2=='')['update'])


>>> df1
     a    b
0  aaa  eee
1  bbb    X
2  ccc  ggg
3  ddd    Y

Upvotes: 3

jezrael
jezrael

Reputation: 863291

Use combine_first or fillna:

df1['b'] = df2['update'].mask(lambda x: x=='').combine_first(df1['b'])
#alternative
#df1['b'] = df2['update'].mask(lambda x: x=='').fillna(df1['b'])

print (df1)
     a    b
0  aaa  eee
1  bbb    X
2  ccc  ggg
3  ddd    Y

But is necessary same index values in both DataFrames.

Upvotes: 3

Related Questions