Geordi Alm
Geordi Alm

Reputation: 51

Pandas: one to many join on Data Frames with different indexes

I have a dataframe and a reference table with admin/op designations that I would like to join on "MNGR", which I believe is a one to many relationship. I don't need any of the other columns from DF2, but obviously need to reference "MNGR" as my index for the join.

Edit: Note that I've already been through Merging 101 and haven't been able to find an answer that applies here.

DF1 = EMPLID  MNGR        YEAR     V1     V2     V3     V4
      12      BOB         2012     x      y      z      a
      13      JIM         2013     x      y      z      a
      14      RHONDA      2012     x      y      z      a
      15      RHONDA      2012     x      y      z      a
      16      JIM         2012     x      y      z      a
      17      RHONDA      2012     x      y      z      a

DF2 = MNGR      ADMIN/OP        YEAR       TRACT
      BOB       ADMINISTRATIVE  2000       A
      JIM       OPERATIONS      2013       B
      RHONDA    ADMINISTRATIVE  2012       A

I've performed this operation in SQL years ago by doing a one-to-many join, but I can't seem to understand how this functionality translates to python:

DF1.join(DF2.set_index('MNGR'), on='MNGR')

This passes, but yields me with DF2 joined, and ADMIN/OP all null.

My expected result:

DF1 = EMPLID  MNGR        YEAR     V1     V2     V3     V4    ADMIN/OP
      12      BOB         2012     x      y      z      a     ADMINISTRATIVE
      13      JIM         2013     x      y      z      a     OPERATIONS
      14      RHONDA      2012     x      y      z      a     ADMINISTRATIVE
      15      RHONDA      2012     x      y      z      a     ADMINISTRATIVE
      16      JIM         2012     x      y      z      a     OPERATIONS
      17      RHONDA      2012     x      y      z      a     ADMINISTRATIVE

Upvotes: 1

Views: 1031

Answers (1)

noah
noah

Reputation: 2786

import pandas as pd

# sample dataframes
DF1 = pd.DataFrame({'EMPLID': [12, 13, 14, 15, 16, 17], 'MNGR': ['BOB', 'JIM', 'RHONDA', 'RHONDA', 'JIM', 'RHONDA'], 'YEAR': [2012, 2013, 2012, 2012, 2012, 2012], 'V1': ['x', 'x', 'x', 'x', 'x', 'x'], 'V2': ['y', 'y', 'y', 'y', 'y', 'y'], 'V3': ['z', 'z', 'z', 'z', 'z', 'z'], 'V4': ['a', 'a', 'a', 'a', 'a', 'a']})
DF2 = pd.DataFrame({'MNGR': ['BOB', 'JIM', 'RHONDA'], 'ADMIN/OP': ['ADMINISTRATIVE', 'OPERATIONS', 'ADMINISTRATIVE'], 'YEAR': [2000, 2013, 2012], 'TRACT': ['A', 'B', 'A']})

# verify there is not whitespace in the column names
DF1.columns = DF1.columns.str.strip()
DF2.columns = DF2.columns.str.strip()

# merge the dataframes
df = DF1.merge(DF2[["MNGR","ADMIN/OP"]], on='MNGR', how='left')

# display(df)
 EMPLID   MNGR  YEAR V1 V2 V3 V4       ADMIN/OP
     12    BOB  2012  x  y  z  a ADMINISTRATIVE
     13    JIM  2013  x  y  z  a     OPERATIONS
     14 RHONDA  2012  x  y  z  a ADMINISTRATIVE
     15 RHONDA  2012  x  y  z  a ADMINISTRATIVE
     16    JIM  2012  x  y  z  a     OPERATIONS
     17 RHONDA  2012  x  y  z  a ADMINISTRATIVE

Upvotes: 1

Related Questions