Reputation: 129
I have a dataframe of authors
id|author_1|author_2|author_3|author_4|author_5
1|Joe M |Sally K |Terry O
2|Jack T |Mike K
I want to create a data frame with all the possible connections between the authors. The dataframe should look like this:
id|source |target |
1|Joe M |Sally K |
1|Joe M |Terry O |
1|Sally K |Terry O |
2|Jack T |Mike K |
The goal is to have all the unique connections between the authors. For a given ID there can be up to 5 authors and I want the list for all the possible connections for all the authors. I also want to avoid duplicates because A - B == B - A.
I am not sure how to start.
Upvotes: 0
Views: 37
Reputation: 862641
Idea is create all combinations with length 2 by all columns without id
and pass to DataFrame
constructor:
from itertools import combinations
L = [(k,) + x for k, v in df.set_index('id').T.items() for x in combinations(v.dropna(), 2)]
df = pd.DataFrame(L, columns=['id','source','target'])
print (df)
id source target
0 1 Joe M Sally K
1 1 Joe M Terry O
2 1 Sally K Terry O
3 2 Jack T Mike K
Upvotes: 2