cd-6
cd-6

Reputation: 190

Check if values from one column, exists in multiple columns in another dataframe

I am trying to check whether values in the column 'DataLineID:' of a workbook exists in columns 0 or 1 in another workbook with pandas. Where I want to have a new column called match added if there is a match.

My code is:

df_comparing = pd.read_excel(fName, sheet_name=0, index_col=None)

df_compare_basis = pd.read_excel(readxl, index_col=None) 

df_comparing['Match'] = df_comparing[df_comparing.columns[0,1]].isin(df_compare_basis['DataLineID:']).astpye(int)

I get the error IndexError: too many indices for array

If I change the following to df_comparing[df_comparing.columns[0]] I get AttributeError: 'Series' object has no attribute 'astpye'

Is there another way to achieve this?

EDIT:

Here is some sample data:

df_comparing = ({'Line no':['AL5176', 'AL5737', 'AL5978'], 'Line addition':[NaN, NaN, NaN, 'AL9876', 'AL6789', 'AL5945']})

df_compare_basis = ({'DataLineID:':['AL5176', 'AL5737', 'AL5978','AL9876', 'AL6789', 'AL5945']})

Output thats wanted:

df_comparing = ({'Line no':['AL5176', 'AL5737', 'AL5978'], 'Line addition':[NaN, NaN, NaN, 'AL9876', 'AL6789', 'AL5945'], 'Match':[1,1,1,1,1,1]})

Upvotes: 0

Views: 1583

Answers (1)

luigigi
luigigi

Reputation: 4215

I guess this is what you are looking for:

import numpy as np
df_comparing['Match'] = np.where((df_comparing['Line no'].isin(df_compare_basis['DataLineID:'])) | (df_comparing['Line addition'].isin(df_compare_basis['DataLineID:'])), 1, 0)

  Line no Line addition  Match
0  AL5176           NaN      1
1  AL5737           NaN      1
2  AL5978           NaN      1
3     NaN        AL9876      1
4     NaN        AL6789      1
5     NaN        AL5945      1

Upvotes: 1

Related Questions