cyril
cyril

Reputation: 3006

Pandas: merge dataframes and create new conditional columns

I have two large pandas dataframes (with millions of rows) that contain two columns, a group identifier and an id. I'm trying to create a combined dataframe that contains the group, id, plus a column with a 1 if the id was in the first dataframe, else a 0 and a column with a 1 if the id was in the second dataframe, else a 0.

In other words, I'm trying to merge the two dataframes and create conditional columns based on if the id was present for each original dataframe. Any suggestions on how to approach this problem?

Here is a small example:

import pandas as pd

>>> df_a = pd.DataFrame({'group': list('AAABBB'), 'id': [11,12,13,21,22,23]})
>>> df_b = pd.DataFrame({'group': list('AAABB'), 'id': [11,13,14,22,24]})
>>> df_a

 group  id
 A     11
 A     12
 A     13
 B     21
 B     22
 B     23

>>> df_b

group  id
A     11
A     13
A     14
B     22
B     24

The output should look like this:

>>> df_full      
 group  id  a  b
 A      11  1  1
 A      12  1  0
 A      13  1  1
 A      14  0  1
 B      21  1  0
 B      22  1  1
 B      23  1  0
 B      24  0  1

Upvotes: 1

Views: 310

Answers (1)

akuiper
akuiper

Reputation: 215047

You can create two columns for each of the data frame with one before merging and fill na with zero after merging:

df_a['a'] = 1
df_b['b'] = 1

pd.merge(df_a, df_b, how = 'outer', on = ['group', 'id']).fillna(0)

# group   id      a   b
# 0   A 11.0    1.0 1.0
# 1   A 12.0    1.0 0.0
# 2   A 13.0    1.0 1.0
# 3   B 21.0    1.0 0.0
# 4   B 22.0    1.0 1.0
# 5   B 23.0    1.0 0.0
# 6   A 14.0    0.0 1.0
# 7   B 24.0    0.0 1.0

Upvotes: 2

Related Questions