Compare 2 dataframes imported from excel

Question

I am working on data comparison project. Here i have 2 excel files 'Prod1' and 'Proj1' which i had imported in python as data frames. Both the data frames have 46 identical columns with identical labels. But different number of rows across both files (e.g. Prod1 is having 100 while Proj1 is 110). I want to compare cell by cell values and highlight the mismatch as 'TRUE' or 'FALSE'in another dataframe. Which can be further exported in excel. Till now i am able to read the files but got struck in comparison part. Kindly assist

import os

import pandas as pd

import numpy as np

import xlrd

os.getcwd()

os.chdir("C:\Users\desktop\Python Project")

Prod1= pd.read_excel("Prod1.xls")

df1 = pd.read_excel('Prod1.xls', 'Prod1')

print(df1)

df2 = pd.read_excel('Proj1.xls', 'Proj1')

print(df2)

Benjamin Rowell · Accepted Answer

Assuming df2 is the larger dataframe, as you say in your explanation, resize to match df1:

>>> df1 = df1.reindex(index=df2.index)

Then just print the following:

>>> print(df1 == df2)

Example

# Define two Pandas Dataframes

>>> df1 = pd.DataFrame(data={'col1': [1,2], 'col2': [3,4]})
>>> df2 = pd.DataFrame(data={'col1': [1,2,3], 'col2': [3,4,5]})

>>> df1
   col1  col2
0     1     3
1     2     4

>>> df2
   col1  col2
0     1     3
1     2     4
2     3     5

The Dataframes purposely don't match in size, so if you tried to do the following:

>>> print(df1 == df2)

You would see an error like this:

ValueError: Can only compare identically-labeled DataFrame objects

In order to compare the two, use the index of the bigger Dataframe df2 (this describes the labels applied to each row) and we use it to redefine the index of df1. Note this only works if the two datasets utilise the same indexing.

>>> df1 = df1.reindex(index=df2.index)
>>> df1
   col1  col2
0   1.0   3.0
1   2.0   4.0
2   NaN   NaN

Now if you try to compare them, you will see the following:

>>> print(df1 == df2)
    col1   col2
0   True   True
1   True   True
2  False  False

Compare 2 dataframes imported from excel

Answers (2)

Related Questions