Reputation: 73
I'd like to create an edge list with weights as an attribute (counts number of pair occurrences - e.g., how many months have the pair a-b been together in the same group).
The dataframe contains a monthly snapshot of people in a particular team (there are no duplicates on the monthly groups)
monthyear | name |
---|---|
jun2020 | a |
jun2020 | b |
jun2020 | c |
jul2020 | a |
jul2020 | b |
jul2020 | d |
The output should look like the following (it's non-directional so a-b pair is the same as b-a):
node1 | node2 | weight |
---|---|---|
a | b | 2 |
b | c | 1 |
a | c | 1 |
a | d | 1 |
b | d | 1 |
I managed to create a new dataframe with the names combinations using the following:
df1 = pd.DataFrame(data=list(combinations(df['name'].unique().tolist(), 2)), columns=['node1', 'node2'])
Now I'm not sure how to iterate over this new dataframe to populate the weights. How can this be done?
Upvotes: 7
Views: 820
Reputation: 28322
Assuming that there are no duplicates within each monthyear
group, you can get all 2-combinations of names within each group and then group by the node names to obtain the weight.
from itertools import combinations
def get_combinations(group):
return pd.DataFrame([sorted(e) for e in list(combinations(group['name'].values, 2))], columns=['node1', 'node2'])
df = df.groupby('monthyear').apply(get_combinations)
This will give you an intermediate result:
node1 node2
monthyear
jul2020 0 a b
1 a d
2 b d
jun2020 0 a b
1 a c
2 b c
Now, calculate the weight:
df = df.groupby(['node1', 'node2']).size().to_frame('weight').reset_index()
Final result:
node1 node2 weight
0 a b 2
1 a c 1
2 a d 1
3 b c 1
4 b d 1
Upvotes: 3