Reputation: 51
I have a dataframe and a reference table with admin/op designations that I would like to join on "MNGR", which I believe is a one to many relationship. I don't need any of the other columns from DF2, but obviously need to reference "MNGR" as my index for the join.
Edit: Note that I've already been through Merging 101 and haven't been able to find an answer that applies here.
DF1 = EMPLID MNGR YEAR V1 V2 V3 V4
12 BOB 2012 x y z a
13 JIM 2013 x y z a
14 RHONDA 2012 x y z a
15 RHONDA 2012 x y z a
16 JIM 2012 x y z a
17 RHONDA 2012 x y z a
DF2 = MNGR ADMIN/OP YEAR TRACT
BOB ADMINISTRATIVE 2000 A
JIM OPERATIONS 2013 B
RHONDA ADMINISTRATIVE 2012 A
I've performed this operation in SQL years ago by doing a one-to-many join, but I can't seem to understand how this functionality translates to python:
DF1.join(DF2.set_index('MNGR'), on='MNGR')
This passes, but yields me with DF2 joined, and ADMIN/OP all null.
My expected result:
DF1 = EMPLID MNGR YEAR V1 V2 V3 V4 ADMIN/OP
12 BOB 2012 x y z a ADMINISTRATIVE
13 JIM 2013 x y z a OPERATIONS
14 RHONDA 2012 x y z a ADMINISTRATIVE
15 RHONDA 2012 x y z a ADMINISTRATIVE
16 JIM 2012 x y z a OPERATIONS
17 RHONDA 2012 x y z a ADMINISTRATIVE
Upvotes: 1
Views: 1031
Reputation: 2786
pandas.DataFrame.merge
or pandas.merge
pandas.DataFrame.join
is for combining dataframes on the index.import pandas as pd
# sample dataframes
DF1 = pd.DataFrame({'EMPLID': [12, 13, 14, 15, 16, 17], 'MNGR': ['BOB', 'JIM', 'RHONDA', 'RHONDA', 'JIM', 'RHONDA'], 'YEAR': [2012, 2013, 2012, 2012, 2012, 2012], 'V1': ['x', 'x', 'x', 'x', 'x', 'x'], 'V2': ['y', 'y', 'y', 'y', 'y', 'y'], 'V3': ['z', 'z', 'z', 'z', 'z', 'z'], 'V4': ['a', 'a', 'a', 'a', 'a', 'a']})
DF2 = pd.DataFrame({'MNGR': ['BOB', 'JIM', 'RHONDA'], 'ADMIN/OP': ['ADMINISTRATIVE', 'OPERATIONS', 'ADMINISTRATIVE'], 'YEAR': [2000, 2013, 2012], 'TRACT': ['A', 'B', 'A']})
# verify there is not whitespace in the column names
DF1.columns = DF1.columns.str.strip()
DF2.columns = DF2.columns.str.strip()
# merge the dataframes
df = DF1.merge(DF2[["MNGR","ADMIN/OP"]], on='MNGR', how='left')
# display(df)
EMPLID MNGR YEAR V1 V2 V3 V4 ADMIN/OP
12 BOB 2012 x y z a ADMINISTRATIVE
13 JIM 2013 x y z a OPERATIONS
14 RHONDA 2012 x y z a ADMINISTRATIVE
15 RHONDA 2012 x y z a ADMINISTRATIVE
16 JIM 2012 x y z a OPERATIONS
17 RHONDA 2012 x y z a ADMINISTRATIVE
Upvotes: 1