theMadKing
theMadKing

Reputation: 2074

Python DataFrame Join on Specific Column Index?

I have two Dataframes with different column names I want to join on. I have not found an example of a join with two different names or using a column index.

fileDf = pd.read_csv(fileMeta, sep=delimiter, error_bad_lines=False)
lakeDf = pd.read_csv(lakeData, sep=delimiter, error_bad_lines=False)

In fileDF I have index 7 or DATASET_ID. In lakeDF I have TBL_ID or Index 0 I want to join on. How can I do it when there are two different names/indexs?

Edit: still isn't merged, here is sample from .head()

-----------------file DF-------------------
   FILE_ID       CREATION_DT_TM entityType             FILE_LOCATION OWNER  \
0   268159  2015/08/12T03:52:58       File  /src/impacs/im_acct.dat/  null
1   268165  2015/08/22T05:03:51       File  /src/impacs/im_acct.dat/  null
2   268173  2015/07/18T04:48:12       File  /src/impacs/im_acct.dat/  null
3   268177  2015/09/09T05:17:03       File  /src/impacs/im_acct.dat/  null
4   268179  2015/07/28T04:31:06       File  /src/impacs/im_acct.dat/  null

  PUBLICATION_DT_TM    FILE_SIZE  DATASET_ID          last_modified NAV_ID  \
0              null  22095504000      578748  2015-10-07 08:16:48.0   null
1              null  22169091000      578748  2015-10-07 08:16:48.0   null
2              null  21933828000      578748  2015-10-07 08:16:48.0   null
3              null  22279575000      578748  2015-10-07 08:16:48.0   null
4              null  21994644000      578748  2015-10-07 08:16:48.0   null


[5 rows x 22 columns]
-----------------lake DF ------------------
   TBL_ID TBL_ADDITIONAL_INFO TBL_API TBL_ARCHIVECRITERIA  \
0  576781                null    null                null
1  576782                null    null                null
2  576783                null   false                null
3  576784                null    null                null
4  576785                null   false                null

Upvotes: 1

Views: 144

Answers (1)

EdChum
EdChum

Reputation: 394179

You can merge on different columns for lhs and rhs, additionally you may need to specify the type of merge in your case as you had duplicate values on the lhs then you can elect to perform a 'left' merge:

fileDf.merge(lakeDf. left_on='DATESET_ID', right_index=True, how='left')

Upvotes: 1

Related Questions