Reputation: 115
Let's say I have following dataframe:
index A B
-----------------
1 A1 B1
2 A1 B2
3 A1 B3
4 A2 B1
How do I write a code that returns these pairs (Ax,By) that satisfy such condition that Ax is connected with more different Bs than By is connected with different As.
In this case it should return (A1, B1) because A1 is connected with 3 different Bs, but B1 is connected with 2 different As.
Upvotes: 2
Views: 394
Reputation: 12493
Here's a way to do that (in a couple of steps, for clarity):
# Drop duplicates in case there are any
df = df.drop_duplicates()
df["A_count"] = df.groupby("A")["B"].transform("count")
df["B_count"] = df.groupby("B")["A"].transform("count")
df[(df.A_count > df.B_count)]
The output is:
A B A_count B_count
0 A1 B1 3 2
1 A1 B2 3 1
2 A1 B3 3 1
Upvotes: 2
Reputation: 88226
We could treat this as a graph problem, and check which of these nodes have a degree higher than 1. Then just index on those rows where both values satisfy the condition:
import networkx as nx
G = nx.from_pandas_edgelist(df, source='A', target='B')
keep = [node for node, deg in G.degree() if deg>1]
df[df[['A','B']].isin(keep).all(1)]
index A B
0 1 A1 B1
Upvotes: 5