Matthew Barlowe
Matthew Barlowe

Reputation: 2328

Join where keys are not equal in Pandas

I have a dataframe like this:

data = {'teamid': [1, 2, 3, 4], 'gameid': [1, 1, 2, 2], 'rebounds': [20, 35, 43, 15]}
game_df = pd.DataFrame(data=data)
print(game_df)

   teamid  gameid  rebounds
0       1       1        20
1       2       1        35
2       3       2        43
3       4       2        15

I would like to join it to it self to produce a dataframe like this:

wanted_data = {'teamid': [1, 2, 3, 4], 'gameid': [1, 1, 2, 2], 'rebounds': [20, 35, 43, 15],
               'teamid_opponent': [2, 1, 4, 3], 'rebound_opponent': [35, 20, 15, 43]}
wanted_df = pd.DataFrame(data=wanted_data)
print(wanted_df)

   teamid  gameid  rebounds  teamid_opponent  rebound_opponent
0       1       1        20                2                35
1       2       1        35                1                20
2       3       2        43                4                15
3       4       2        15                3                43

In SQL I would just do something like this:

SELECT * from game_df df1 join game_df df2 on df1.gameid = df2.gameid and df1.teamid != df2.teamid

But i haven't been able to find anything in the pandas docs or on here for a way to replicate this in pandas itself. I looked on here and found this link but it isn't quite the same as what I'm trying to do. I've only found examples of trying to join where keys are equal.

Upvotes: 3

Views: 1265

Answers (2)

Matthew Barlowe
Matthew Barlowe

Reputation: 2328

I will also add seeing the answer made me think of a different way to do this as well so I will post that for posterity

Yourdf=game_df.merge(game_df,on='gameid',suffixes =['','_opponent'])
Yourdf[Yourdf.teamid != Yourdf.teamid_opponent]

   teamid  gameid  rebounds  teamid_opponent  rebounds_opponent
1       1       1        20                2                 35
2       2       1        35                1                 20
5       3       2        43                4                 15
6       4       2        15                3                 43

Upvotes: 1

BENY
BENY

Reputation: 323326

Here is one way use merge

Yourdf=game_df.merge(game_df,on='gameid',suffixes =['','_opponent']).query('teamid!=teamid_opponent')
Out[42]: 
   teamid  gameid  rebounds  teamid_opponent  rebounds_opponent
1       1       1        20                2                 35
2       2       1        35                1                 20
5       3       2        43                4                 15
6       4       2        15                3                 43

Upvotes: 5

Related Questions