SquirreledHogs
SquirreledHogs

Reputation: 13

Link lists that share common elements

I have an issue similar to this one with a few differences/complications

I have a list of groups containing members, rather than merging the groups that share members I need to preserve the groupings and create a new set of edges based on which groups have members in common, and do so conditionally based on attributes of the groups

The source data looks like this:

+----------+------------+-----------+
| Group ID | Group Type | Member ID |
+----------+------------+-----------+
| A        | Type 1     |         1 |
| A        | Type 1     |         2 |
| B        | Type 1     |         2 |
| B        | Type 1     |         3 |
| C        | Type 1     |         3 |
| C        | Type 1     |         4 |
| D        | Type 2     |         4 |
| D        | Type 2     |         5 |
+----------+------------+-----------+

Desired output is this:

+----------+-----------------+
| Group ID | Linked Group ID |
+----------+-----------------+
| A        | B               |
| B        | C               |
+----------+-----------------+

A is linked to B because it shares 2 in common B is linked to C because it shares 3 in common C is not linked to D, it has a member in common but is of a different type

The number of shared members doesn't matter for my purposes, a single member in common means they're linked

The output is being used as the edges of a graph, so if the output is a graph that fits the rules that's fine

The source dataset is large (hundreds of millions of rows), so performance is a consideration

This poses a similar question, however I'm new to Python and can't figure out how to get the source data to a point where I can use the answer, or work in the additional requirement of the group type matching

Upvotes: 1

Views: 85

Answers (1)

Sushant
Sushant

Reputation: 180

Try some thing like this-

df1=df.groupby(['Group Type','Member ID'])['Group ID'].apply(','.join).reset_index()
df2=df1[df1['Group ID'].str.contains(",")]

This might not handle the case of cyclic grouping.

Upvotes: 1

Related Questions