h3rmit
h3rmit

Reputation: 73

Creating an edge list from a pandas dataframe

I'd like to create an edge list with weights as an attribute (counts number of pair occurrences - e.g., how many months have the pair a-b been together in the same group).

The dataframe contains a monthly snapshot of people in a particular team (there are no duplicates on the monthly groups)

monthyear name
jun2020 a
jun2020 b
jun2020 c
jul2020 a
jul2020 b
jul2020 d

The output should look like the following (it's non-directional so a-b pair is the same as b-a):

node1 node2 weight
a b 2
b c 1
a c 1
a d 1
b d 1

I managed to create a new dataframe with the names combinations using the following:

df1 = pd.DataFrame(data=list(combinations(df['name'].unique().tolist(), 2)), columns=['node1', 'node2'])

Now I'm not sure how to iterate over this new dataframe to populate the weights. How can this be done?

Upvotes: 7

Views: 820

Answers (1)

Shaido
Shaido

Reputation: 28322

Assuming that there are no duplicates within each monthyear group, you can get all 2-combinations of names within each group and then group by the node names to obtain the weight.

from itertools import combinations

def get_combinations(group):
    return pd.DataFrame([sorted(e) for e in list(combinations(group['name'].values, 2))], columns=['node1', 'node2'])

df = df.groupby('monthyear').apply(get_combinations)

This will give you an intermediate result:

            node1 node2
monthyear              
jul2020   0     a     b
          1     a     d
          2     b     d
jun2020   0     a     b
          1     a     c
          2     b     c

Now, calculate the weight:

df = df.groupby(['node1', 'node2']).size().to_frame('weight').reset_index()

Final result:

  node1 node2  weight
0     a     b       2
1     a     c       1
2     a     d       1
3     b     c       1
4     b     d       1

Upvotes: 3

Related Questions