Slajni
Slajni

Reputation: 115

Get pairs of values satisfying condition based on mutual connection pandas

Let's say I have following dataframe:

     index    A      B
     -----------------
      1      A1     B1
      2      A1     B2
      3      A1     B3
      4      A2     B1

How do I write a code that returns these pairs (Ax,By) that satisfy such condition that Ax is connected with more different Bs than By is connected with different As.

In this case it should return (A1, B1) because A1 is connected with 3 different Bs, but B1 is connected with 2 different As.

Upvotes: 2

Views: 394

Answers (2)

Roy2012
Roy2012

Reputation: 12493

Here's a way to do that (in a couple of steps, for clarity):

# Drop duplicates in case there are any
df = df.drop_duplicates() 

df["A_count"] = df.groupby("A")["B"].transform("count")
df["B_count"] = df.groupby("B")["A"].transform("count")
df[(df.A_count > df.B_count)]

The output is:

    A   B  A_count  B_count
0  A1  B1        3        2
1  A1  B2        3        1
2  A1  B3        3        1

Upvotes: 2

yatu
yatu

Reputation: 88226

We could treat this as a graph problem, and check which of these nodes have a degree higher than 1. Then just index on those rows where both values satisfy the condition:

import networkx as nx

G = nx.from_pandas_edgelist(df, source='A', target='B')
keep = [node for node, deg in G.degree() if deg>1]
df[df[['A','B']].isin(keep).all(1)]

   index   A   B
0      1  A1  B1

Upvotes: 5

Related Questions