Py.rookie89
Py.rookie89

Reputation: 129

Creating dataframe with all the author connections

I have a dataframe of authors

  id|author_1|author_2|author_3|author_4|author_5
   1|Joe M   |Sally K |Terry O 
   2|Jack T  |Mike  K 

I want to create a data frame with all the possible connections between the authors. The dataframe should look like this:

  id|source  |target  |
   1|Joe M   |Sally K | 
   1|Joe M   |Terry O | 
   1|Sally K |Terry O |
   2|Jack T  |Mike  K |

The goal is to have all the unique connections between the authors. For a given ID there can be up to 5 authors and I want the list for all the possible connections for all the authors. I also want to avoid duplicates because A - B == B - A.

I am not sure how to start.

Upvotes: 0

Views: 37

Answers (1)

jezrael
jezrael

Reputation: 862641

Idea is create all combinations with length 2 by all columns without id and pass to DataFrame constructor:

from  itertools import combinations

L = [(k,) + x for k, v in df.set_index('id').T.items() for x in combinations(v.dropna(), 2)]

df = pd.DataFrame(L, columns=['id','source','target'])
print (df)
   id   source   target
0   1    Joe M  Sally K
1   1    Joe M  Terry O
2   1  Sally K  Terry O
3   2   Jack T   Mike K

Upvotes: 2

Related Questions