osint_alex
osint_alex

Reputation: 1022

How to get all unique combinations of values in one column that are in another column

Starting with a dataframe like this:

df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['a', 'b', 'b', 'b', 'a']})
   A  B
0  1  a
1  2  b
2  3  b
3  4  b
4  5  a

What is the best way of getting to a dataframe like this?

pd.DataFrame({'source': [1, 2, 2, 3], 'target': [5, 3, 4, 4]})
   source  target
0       1       5
1       2       3
2       2       4
3       3       4

For each time a row in column A has the same value in column B as another row in column A, I want to save the unique instances of that relationship in a new dataframe.

This is pretty close:

df.groupby('B')['A'].unique()
B
a       [1, 5]
b    [2, 3, 4]
Name: A, dtype: object

But I'd ideally convert it into a single dataframe now and my brain has gone kaput.

Upvotes: 2

Views: 557

Answers (2)

User1909203
User1909203

Reputation: 56

use merge function

df.merge(df, how = "outer", on = ["B"]).query("A_x < A_y")

Upvotes: 2

BENY
BENY

Reputation: 323266

In your case , you can do itertools.combinations

import itertools
s = df.groupby('B')['A'].apply(lambda x : set(list(itertools.combinations(x, 2)))).explode().tolist()
out = pd.DataFrame(s,columns=['source','target'])
out
Out[312]: 
   source  target
0       1       5
1       3       4
2       2       3
3       2       4

Upvotes: 4

Related Questions