Reputation: 190
I am trying to check whether values in the column 'DataLineID:' of a workbook exists in columns 0 or 1 in another workbook with pandas. Where I want to have a new column called match added if there is a match.
My code is:
df_comparing = pd.read_excel(fName, sheet_name=0, index_col=None)
df_compare_basis = pd.read_excel(readxl, index_col=None)
df_comparing['Match'] = df_comparing[df_comparing.columns[0,1]].isin(df_compare_basis['DataLineID:']).astpye(int)
I get the error IndexError: too many indices for array
If I change the following to df_comparing[df_comparing.columns[0]]
I get AttributeError: 'Series' object has no attribute 'astpye'
Is there another way to achieve this?
EDIT:
Here is some sample data:
df_comparing = ({'Line no':['AL5176', 'AL5737', 'AL5978'], 'Line addition':[NaN, NaN, NaN, 'AL9876', 'AL6789', 'AL5945']})
df_compare_basis = ({'DataLineID:':['AL5176', 'AL5737', 'AL5978','AL9876', 'AL6789', 'AL5945']})
Output thats wanted:
df_comparing = ({'Line no':['AL5176', 'AL5737', 'AL5978'], 'Line addition':[NaN, NaN, NaN, 'AL9876', 'AL6789', 'AL5945'], 'Match':[1,1,1,1,1,1]})
Upvotes: 0
Views: 1583
Reputation: 4215
I guess this is what you are looking for:
import numpy as np
df_comparing['Match'] = np.where((df_comparing['Line no'].isin(df_compare_basis['DataLineID:'])) | (df_comparing['Line addition'].isin(df_compare_basis['DataLineID:'])), 1, 0)
Line no Line addition Match
0 AL5176 NaN 1
1 AL5737 NaN 1
2 AL5978 NaN 1
3 NaN AL9876 1
4 NaN AL6789 1
5 NaN AL5945 1
Upvotes: 1