user3841581
user3841581

Reputation: 2747

Comparing the content a List of pandas DataFrames

I have a list that contains three pandas DataFrames. All the DataFrames have the exact column names and have the same length. I would like to compare all the entries of a specific column for in each DataFrame. Assuming that the List has:

List=[df1,df2,df3].

and each dataFrame has the following structure. df1 has the structure

column1    column2   column3
  4          3          4
  4          5          7
  7          6          6
  8          6          4

df2 has the structure

column1    column2   column3
  4          3          4
  7          5          7
  7          6          5
  8          6          4

df3 has the structure

column1    column2   column3
  4          3          5
  4          1          7
  7          6          6
  8          6          4

I would like to compare the content of df1 column1 and column2(for each row) with the contain df2 (column1 and column2) and df3 (column1 and column2)

I wrote something thought about something like this:

for i in range(len(List)):# iterate through the list
    for j in range(len(List[0].index.values)):# iterate through the the whole dataFrame
    #I would like to so something like: if df1[column1][row1]=df2[column1][row1] then do ....
    # now i dont know how to iterate through all the dataFrames simulatanously to compare the content of of column 1 and column 2(for each row k) of df1 with the content of column 1 and column 2 of df2 and column 1 and column 2 of df3.

I am stuck there

Upvotes: 1

Views: 129

Answers (1)

Helder
Helder

Reputation: 548

First, create dataframes with the data provided

import pandas as pd

df1 = pd.DataFrame({
    'column1': [4,4,7,8],
    'column2': [3,5,6,6],
    'column3': [4,7,6,4]
})
print(df1)
#    column1  column2  column3
# 0        4        3        4
# 1        4        5        7
# 2        7        6        6
# 3        8        6        4

df2 = df1.copy()
df2['column1'][1] = 7
df2['column3'][2] = 5
print(df2)
#    column1  column2  column3
# 0        4        3        4
# 1        7        5        7
# 2        7        6        5
# 3        8        6        4

df3 = df1.copy()
df3['column2'][1] = 1
df3['column3'][0] = 5
print(df3)
#    column1  column2  column3
# 0        4        3        5
# 1        4        1        7
# 2        7        6        6
# 3        8        6        4

Then, to get a dataframe of the same shape, with a boolean value indicating which entries are equal in both dataframes

print(df1.eq(df2))
#    column1  column2  column3
# 0     True     True     True
# 1    False     True     True
# 2     True     True    False
# 3     True     True     True

To get a series of booleans indicating for which columns all the corresponding rows are equal in both dataframes

print(df1.eq(df2).all())
# column1    False
# column2     True
# column3    False
# dtype: bool

To get a series of booleans indicating for which rows all the corresponding columns are equal in both dataframes

print(df1.eq(df2).all(axis='columns'))
# 0     True
# 1    False
# 2    False
# 3     True
# dtype: bool

To get a single boolean indicating wheter all corresponding entries are equal in both dataframes

print(df1.equals(df2))
# False

If you need to combine every pair of dataframes and compare them, you can use

from itertools import combinations
List = [df1, df2, df3]
for a, b in combinations(enumerate(List, 1), 2):
    print(f'df{a[0]}.equals(df{b[0]}): ', a[1].equals(b[1]))
# df1.equals(df2):  False
# df1.equals(df3):  False
# df2.equals(df3):  False

Upvotes: 0

Related Questions