Reputation: 13
I have an issue similar to this one with a few differences/complications
I have a list of groups containing members, rather than merging the groups that share members I need to preserve the groupings and create a new set of edges based on which groups have members in common, and do so conditionally based on attributes of the groups
The source data looks like this:
+----------+------------+-----------+ | Group ID | Group Type | Member ID | +----------+------------+-----------+ | A | Type 1 | 1 | | A | Type 1 | 2 | | B | Type 1 | 2 | | B | Type 1 | 3 | | C | Type 1 | 3 | | C | Type 1 | 4 | | D | Type 2 | 4 | | D | Type 2 | 5 | +----------+------------+-----------+
Desired output is this:
+----------+-----------------+ | Group ID | Linked Group ID | +----------+-----------------+ | A | B | | B | C | +----------+-----------------+
A is linked to B because it shares 2 in common B is linked to C because it shares 3 in common C is not linked to D, it has a member in common but is of a different type
The number of shared members doesn't matter for my purposes, a single member in common means they're linked
The output is being used as the edges of a graph, so if the output is a graph that fits the rules that's fine
The source dataset is large (hundreds of millions of rows), so performance is a consideration
This poses a similar question, however I'm new to Python and can't figure out how to get the source data to a point where I can use the answer, or work in the additional requirement of the group type matching
Upvotes: 1
Views: 85
Reputation: 180
Try some thing like this-
df1=df.groupby(['Group Type','Member ID'])['Group ID'].apply(','.join).reset_index()
df2=df1[df1['Group ID'].str.contains(",")]
This might not handle the case of cyclic grouping.
Upvotes: 1