Reputation: 285
I have a dataframe df1
that looks like this:
d1 = {'way_id': [4, 3, 1, 8, 5, 2, 7, 9, 6], 'source': [12, 11, 13, 15, 19, 16, 18, 14, 17], 'target': [28, 24, 25, 21, 24, 29, 27, 22, 23]}
df1 = pd.DataFrame(data=d1)
way_id source target
4 12 28
3 11 24
1 13 25
8 15 21
5 19 24
2 16 29
7 18 27
9 14 22
6 17 23
And a second dataframe df2
that looks like this:
d2 = {'node1': [11, 13, 18, 17, 27], 'node2': [24, 29, 27, 25, 18]}
df2 = pd.DataFrame(data=d2)
df2
node1 node2
11 24
13 29
18 27
17 25
27 18
Now I would like to evaluate how often the value pairs per row in the columns node1
and node2
in df2
occur in the rows of the columns of source
and target
in df1
. The order is irrelevant. That's why the desired output should be like this:
way_id source target count
4 12 28 0
3 11 24 1
1 13 25 0
8 15 21 0
5 19 24 0
2 16 29 0
7 18 27 2
9 14 22 0
6 17 23 0
I'm newish to python and struggle how to deal with comparing values in multiple columns between two dataframes. Does anyone have a suggestion?
Upvotes: 1
Views: 213
Reputation: 862511
You can convert values to frozenset
s, use Series.value_counts
and then Series.map
:
s = df2[['node1','node2']].agg(frozenset, axis=1).value_counts()
df1['count'] = df1[['source','target']].agg(frozenset, axis=1).map(s).fillna(0).astype(int)
print (df1)
way_id source target count
0 4 12 28 0
1 3 11 24 1
2 1 13 25 0
3 8 15 21 0
4 5 19 24 0
5 2 16 29 0
6 7 18 27 2
7 9 14 22 0
8 6 17 23 0
Upvotes: 4