Reputation: 2520
I read multiple questions, but do not see a solution or missing something
I have a dataframe
Author_id Article_id Rank
100 10 1
101 10 2
102 10 3
100 11 1
105 11 2
106 11 3
Expected output:
Author_id1 Author_id2 Article_id Rank
100 101 10 1
100 102 10 1
100 105 11 1
100 106 11 1
101 100 10 2
101 102 10 2
102 100 10 3
102 101 10 3
105 100 11 2
105 106 11 2
106 100 11 3
106 105 11 3
If my understanding of the the desired dataset is not correct, please advise on how to restructure this data. Thanks!
Upvotes: 1
Views: 92
Reputation: 59549
If the DataFrame isn't duplicated on ['Author_id', 'Article_id']
, you can do a self-merge within article, and filter out the rows that merged with themself.
df1 = (df.merge(df.drop(columns='Rank'), on='Article_id', suffixes = ['1', '2'])
.query('Author_id1 != Author_id2')
.sort_values(['Author_id1', 'Rank'])
.reset_index(drop=True))
Author_id1 Article_id Rank Author_id2
0 100 10 1 101
1 100 10 1 102
2 100 11 1 105
3 100 11 1 106
4 101 10 2 100
5 101 10 2 102
6 102 10 3 100
7 102 10 3 101
8 105 11 2 100
9 105 11 2 106
10 106 11 3 100
11 106 11 3 105
Upvotes: 2