Samira Kumar
Samira Kumar

Reputation: 521

Pandas: Create a new column by comparing 2 columns in 2 different data frames

I've 2 data frames in pandas.

in_degree:

    Target  in_degree
0   2   1
1   4   24
2   5   53
3   6   98
4   7   34

out_degree

 Source out_degree
0   1   4
1   2   4
2   3   5
3   4   5
4   5   5

By comparing 2 columns, I'd like to create a new data frame which should add columns "in_degree" and "out_degree" and display the result.

The Sample output should look like

 Source/Target  out_degree
0   1   4
1   2   5
2   3   5
3   4   29
4   5   58

Any help would be appreciated.

Thanks.

Upvotes: 1

Views: 75

Answers (1)

cs95
cs95

Reputation: 402523

Traditionally, this would need a merge, but I think you can take advantage of pandas' index aligned arithmetic to do this a bit faster.

x = df2.set_index('Source')
y = df1.set_index('Target').rename_axis('Source')
y.columns = x.columns

x.add(y.reindex(x.index), fill_value=0).reset_index()

   Source  out_degree
0       1         4.0
1       2         5.0
2       3         5.0
3       4        29.0
4       5        58.0

The "traditional" SQL way of solving this would be using merge:

v = df1.merge(df2, left_on='Target', right_on='Source', how='right')
dct = dict(
    Source=v['Source'],
    out_degree=v['in_degree'].add(v['out_degree'], fill_value=0)) 

pd.DataFrame(dct).sort_values('Source')

   Source  out_degree
3       1         4.0
0       2         5.0
4       3         5.0
1       4        29.0
2       5        58.0

Upvotes: 2

Related Questions