Compare values from two pandas data frames, order-independent

Question

I am new to data science. I want to check which elements from one data frame exist in another data frame, e.g.

df1 = [1,2,8,6]
df2 = [5,2,6,9]

# for 1 output should be False

# for 2 output should be True

# for 6 output should be True

etc.

Note: I have matrix not vector.

I have tried using the following code:

import pandas as pd
import numpy as np

    priority_dataframe = pd.read_excel(prioritylist_file_path, sheet_name='Sheet1', index=None)

    priority_dict = {column: np.array(priority_dataframe[column].dropna(axis=0, how='all').str.lower()) for column in
                         priority_dataframe.columns}
    keys_found_per_sheet = []
    if file_path.lower().endswith(('.csv')):
        file_dataframe = pd.read_csv(file_path)
    else:
        file_dataframe = pd.read_excel(file_path, sheet_name=sheet, index=None)

    file_cell_array = list()
    for column in file_dataframe.columns:
        for file_cell in np.array(file_dataframe[column].dropna(axis=0, how='all')):
            if isinstance(file_cell, str) == 'str':
                file_cell_array.append(file_cell)
            else:
                file_cell_array.append(str(file_cell))

    converted_file_cell_array = np.array(file_cell_array)

    for key, values in priority_dict.items():
        for priority_cell in values:
            if priority_cell in converted_file_cell_array[:]:
                keys_found_per_sheet.append(key)
                break

I am doing something wrong in if priority_cell in converted_file_cell_array[:] ?

Is there any other efficient way to do that?

DYZ · Accepted Answer

You can take the .values from each dataframe, convert them to a set(), and take the set intersection.

set1 = set(df1.values.reshape(-1).tolist())
set2 = set(dr2.values.reshape(-1).tolist())
different = set1 & set2

Compare values from two pandas data frames, order-independent

Answers (2)

Related Questions