Compare two dataframe but not calculating the correlation

Question

Assume I have two dataframe containing hundreds cols and rows, I would like compare them based on the same row and column (row and column-wise). For example,

df1 = pd.DataFrame({
              'Place' : ['A', 'B', 'C','D'],
              'Peter' : [4,5,1.2,7],
              'John' : [1,0,3,5],
                 })
df1_1 = df1.set_index('Place')


df2 = pd.DataFrame({
              'Place' : ['A', 'B', 'C','D'],
              'Peter' : ['NA',5,1.2,8.5],
              'John' : [1,0,3,5],
                 })
df2_2 = df2.set_index('Place')

For Peter column in df1_1 and df2_2, Row B and C are the same, but others are not, so the commonplace in Peter column is (2/4) = 0.5 and so on in John column is (4/4) = 1.00

Does any elegant way to do it using pandas?

Ted Petrou · Accepted Answer

You should be able to do (df1 == df2).mean() which will automatically align each column and make each value a boolean. Taking the mean will return the percentage matched.

Your dataframes need to be identically labeled.

Output

John     1.0
Peter    0.5
dtype: float64

Compare two dataframe but not calculating the correlation

Answers (1)

Related Questions