8-Bit Borges
8-Bit Borges

Reputation: 10033

Pandas - join two dataframes and add column values on a new column

I have two dataframes, home:

   home_team_api_id  home_points
0                262           37
1                263           14
2                264           31
3                265           29
4                266           38
5                267           26
6                275           35
7                276           37
8                277           34
9                280           35
10               282           46
11               284           36
12               285           41
13               290           23
14               292           26
15               293           32
16               294           18
17               354           26
18               356           30
19               373           27

and away:

away_team_api_id  away_points
0               262           34
1               263           13
2               264           20
3               265           15
4               266           26
5               267           15
6               275           23
7               276           29
8               277           20
9               280           18
10              282           22
11              284           23
12              285           29
13              290           14
14              292           16
15              293           21
16              294           13
17              354           26
18              356           11
19              373           23

I am trying to create a 'rank' dataframe, where I add home and away points for all teams and get a 'total_point' column, like so:

    rank = away.set_index('away_team_api_id').join(home.set_index('home_team_api_id'))
    rank['total_point'] = rank['away_points'] + rank['home_points']

But I'm getting:

                  away_points  home_points  total_point
away_team_api_id                                       
262                        34          NaN          NaN
263                        13          NaN          NaN
264                        20          NaN          NaN
265                        15          NaN          NaN
266                        26          NaN          NaN
267                        15          NaN          NaN
275                        23          NaN          NaN
276                        29          NaN          NaN
277                        20          NaN          NaN
280                        18          NaN          NaN
282                        22          NaN          NaN
284                        23          NaN          NaN
285                        29          NaN          NaN
290                        14          NaN          NaN
292                        16          NaN          NaN
293                        21          NaN          NaN
294                        13          NaN          NaN
354                        26          NaN          NaN
356                        11          NaN          NaN
373                        23          NaN          NaN

What am I missing?

Upvotes: 1

Views: 122

Answers (1)

Corralien
Corralien

Reputation: 120391

rank = pd.merge(home.set_index("home_team_api_id").rename_axis("team_api_id"),
                away.set_index("away_team_api_id").rename_axis("team_api_id"),
                on="team_api_id")

rank["total_points"] = rank["home_points"] + rank["away_points"]
>>> rank
             home_points  away_points  total_points
team_api_id
262                   37           34            71
263                   14           13            27
264                   31           20            51
265                   29           15            44
266                   38           26            64
267                   26           15            41
275                   35           23            58
276                   37           29            66
277                   34           20            54
280                   35           18            53
282                   46           22            68
284                   36           23            59
285                   41           29            70
290                   23           14            37
292                   26           16            42
293                   32           21            53
294                   18           13            31
354                   26           26            52
356                   30           11            41
373                   27           23            50

Bonus: ranking

rank["rank"] = rank["total_points"].rank(method="max", ascending=False).astype(int)
>>> rank.sort_values("rank")
             home_points  away_points  total_points  rank
team_api_id
262                   37           34            71     1
285                   41           29            70     2
282                   46           22            68     3
276                   37           29            66     4
266                   38           26            64     5
284                   36           23            59     6
275                   35           23            58     7
277                   34           20            54     8
293                   32           21            53    10
280                   35           18            53    10
354                   26           26            52    11
264                   31           20            51    12
373                   27           23            50    13
265                   29           15            44    14
292                   26           16            42    15
267                   26           15            41    17
356                   30           11            41    17
290                   23           14            37    18
294                   18           13            31    19
263                   14           13            27    20

Upvotes: 1

Related Questions