Reputation: 451
I have two excel files, I'll name them: source.xlsx, output.xlsx.
I need to match the data using Caller ID column of source.xlsx to svc_no column of output.xlsx
If there is not a match or the value is 'NULL' using the Caller ID column, I can use the adsl column of source.xlsx to match with port column of output.xlsx.
If there is a match, then I should ignore the port and write the Caller ID
The data in source.xlsx looks like this:
Caller ID adsl Comparison Result
NULL 2/12 Not Match
11111111 2/267 Match
22222222 4/243 Match
22222222 2/117 Possible Match
The data in output.xlsx looks like this:
svc_no Caller ID port Comparison Result
22222222 4/243
11111111 2/267
22222222 2/117
NULL 2/12
My expected output would be to write the data from source.xlsx to output.xlsx:
svc_no Caller ID port Comparison Result
22222222 22222222 4/243 Match
11111111 11111111 2/267 Match
22222222 22222222 2/117 Possible Match
NULL NULL 2/12 Not Match
I tried using:
df = read_excel('source.xlsx')
df1 = read_excel('output.xlsx')
df = df['Caller ID'].isin(df1['svc_no'])]
df['Caller ID'] = df1['Caller ID']
df1.to_excel('output.xlsx')
But it does not match and write randomly.
Upvotes: 2
Views: 1146
Reputation: 164673
This is one way.
# filter output for 2 pre-populated columns
output = output[['svc_no', 'port']]
# add duplicate column
output['Caller ID'] = output['svc_no']
# create series mapping from source
s = source.set_index(['Caller ID', 'adsl'])['Comparison Result']
# map series to output
output['Comparison Result'] = output.set_index(['svc_no','port']).index.map(s.get)
print(output)
svc_no port Caller ID Comparison Result
0 2.22222e+07 4/243 2.22222e+07 Match
1 1.11111e+07 2/267 1.11111e+07 Match
2 2.22222e+07 2/117 2.22222e+07 PossibleMatch
3 NULL 2/12 NULL NotMatch
Upvotes: 1