Johan
Johan

Reputation: 226

generate group col using names in two cols (reflecting ties)?

In python, how can I create a column whose values represent groups, where groups are based on ties indicated by the names of persons (col1) and associates (col2)?

The dataset is longitudinal / panel / time-series. There are many missing values in the associate column. But for some persons, there are many values in the associate col. Associates are probably linked through persons, and this is what I hope the group column will clarify. I.e. there are only needed as many groups, so as to group persons that can be linked through other persons somehow.

How do I go about handling this? It seems to involve at least one loop/iteration, which I am not awfully skilled at writing.

index person associate desired column: group
1 name1 - 1
2 name1 - 1
3 name1 - 1
1 name2 name1 1
2 name2 name4 1
3 name2 name3 1
1 name3 - 1
2 name3 - 1
3 name3 - 1
1 name4 - 1
2 name4 - 1
3 name4 - 1
1 name5 - -
2 name5 - -
3 name5 - -

Upvotes: 1

Views: 39

Answers (1)

mozway
mozway

Reputation: 260430

You need to find the connected_components in a graph, use networkx, then map you groups:

import networkx as nx

G = nx.from_pandas_edgelist(df, source='person', target='associate')
G.remove_node('-')

groups = {n: i if len(g)>1 else '-' for i, g in
          enumerate(nx.connected_components(G), start=1)
          for n in g}

df['group'] = df['person'].map(groups)

Output:

    index person associate group
0       1  name1         -     1
1       2  name1         -     1
2       3  name1         -     1
3       1  name2     name1     1
4       2  name2     name4     1
5       3  name2     name3     1
6       1  name3         -     1
7       2  name3         -     1
8       3  name3         -     1
9       1  name4         -     1
10      2  name4         -     1
11      3  name4         -     1
12      1  name5         -     -
13      2  name5         -     -
14      3  name5         -     -

Your data as a graph:

graph

Upvotes: 2

Related Questions