Reputation: 2074
I have two Dataframes with different column names I want to join on. I have not found an example of a join with two different names or using a column index.
fileDf = pd.read_csv(fileMeta, sep=delimiter, error_bad_lines=False)
lakeDf = pd.read_csv(lakeData, sep=delimiter, error_bad_lines=False)
In fileDF I have index 7 or DATASET_ID. In lakeDF I have TBL_ID or Index 0 I want to join on. How can I do it when there are two different names/indexs?
Edit: still isn't merged, here is sample from .head()
-----------------file DF-------------------
FILE_ID CREATION_DT_TM entityType FILE_LOCATION OWNER \
0 268159 2015/08/12T03:52:58 File /src/impacs/im_acct.dat/ null
1 268165 2015/08/22T05:03:51 File /src/impacs/im_acct.dat/ null
2 268173 2015/07/18T04:48:12 File /src/impacs/im_acct.dat/ null
3 268177 2015/09/09T05:17:03 File /src/impacs/im_acct.dat/ null
4 268179 2015/07/28T04:31:06 File /src/impacs/im_acct.dat/ null
PUBLICATION_DT_TM FILE_SIZE DATASET_ID last_modified NAV_ID \
0 null 22095504000 578748 2015-10-07 08:16:48.0 null
1 null 22169091000 578748 2015-10-07 08:16:48.0 null
2 null 21933828000 578748 2015-10-07 08:16:48.0 null
3 null 22279575000 578748 2015-10-07 08:16:48.0 null
4 null 21994644000 578748 2015-10-07 08:16:48.0 null
[5 rows x 22 columns]
-----------------lake DF ------------------
TBL_ID TBL_ADDITIONAL_INFO TBL_API TBL_ARCHIVECRITERIA \
0 576781 null null null
1 576782 null null null
2 576783 null false null
3 576784 null null null
4 576785 null false null
Upvotes: 1
Views: 144
Reputation: 394179
You can merge on different columns for lhs and rhs, additionally you may need to specify the type of merge in your case as you had duplicate values on the lhs then you can elect to perform a 'left' merge
:
fileDf.merge(lakeDf. left_on='DATESET_ID', right_index=True, how='left')
Upvotes: 1