Ren Lyke
Ren Lyke

Reputation: 283

Error while using Merge function in pandas between two excel files instead of vlookup (Key ERROR)

I am trying to merge to excel file in Pandas.

import pandas as pd
import numpy as np
upload_raw = pd.read_excel(r'C:\Users\Desktop\Upload Raw Data.xlsx',
                     sheet_name = 'Upload',
                     header = 0,
                     index_col = 0,
                     )
mapping = pd.read_excel(r'C:\Users\Desktop\Mapping.xlsx',
                     sheet_name = 'Mapping',
                     header = 0,
                     index_col = 0,
                     )
display(upload_raw)
display(mapping)
upload_lookup=upload_raw.merge(mapping,on ='BRANCH',how = 'outer' )
display(upload_lookup)

I continue to get the KeyError: 'BRANCH'. I checked the values of the columns all are text. The Mapping file has 3 columns while the upload has around 4 columns.

Upload Raw data

BRANCH  DEPT    CREAT_TS    RAF_IND
AA  &CR     2018-06-22-06.48.49.601000   
03  CUE 2018-06-22-11.43.29.859000   
90  T0L 2018-06-22-11.54.52.633000   

Mapping Data:

BRANCH  UNIT    MASTER
03  MAS CoE
04  NAS ET
05  ET  ET

In the error message these are quite prominent.

 # validate the merge keys dtypes. We may need to coerce

# work-around for merge_asof(right_index=True)
# duplicate columns & possible reduce dimensionality

How do i avoid this issue.

I even tried left_on = 'True', right_on = 'True'

left_key = 'lkey', right_key = 'rkey'. I get the error 'rkey not found

Regards, Ren.

Upvotes: 0

Views: 1220

Answers (1)

Trenton McKinney
Trenton McKinney

Reputation: 62453

The main difference seems to be that I don't set 'BRANCH' as the index.

Additionally, mapping 'BRANCH' is imported as int64 because the example only has numbers, whereas upload_raw 'BRANCH' is imported as object.

upload_raw = pd.read_excel('data/2018-09-03_data_mapping.xlsx',
                           sheet_name = 'Upload',
                           header = 0)
mapping = pd.read_excel(r'data/2018-09-03_data_mapping.xlsx',
                        sheet_name = 'Mapping',
                        header = 0)
print(upload_raw)

output:
    BRANCH  DEPT    CREAT_TS    RAF_IND
0   AA  &CR 2018-06-22-06.48.49.601000  NaN
1   3   CUE 2018-06-22-11.43.29.859000  NaN
2   90  T0L 2018-06-22-11.54.52.633000  NaN

mapping['BRANCH'] = mapping['BRANCH'].astype('object')

print(mapping)

output:
    BRANCH  UNIT    MASTER
0   3   MAS CoE
1   4   NAS ET
2   5   ET  ET

upload_lookup=pd.merge(left=upload_raw, right=mapping, on='BRANCH')

print(upload_lookup)

output:
    BRANCH  DEPT    CREAT_TS    RAF_IND UNIT    MASTER
0   3   CUE 2018-06-22-11.43.29.859000  NaN MAS CoE

Upvotes: 1

Related Questions