May Emanuel
May Emanuel

Reputation: 133

Comparing two DataFrames in Pandas

hello people (I am new to python) Question: how can I compare two DataFrames that have the same index but different formats.

Here are two examples of the DataFrames:

datum_von   datum_bis   NEG_00_04                     
2017-10-12  2017-10-12      21.69
2017-10-13  2017-10-13      11.85
2017-10-14  2017-10-14       7.83
2017-10-15  2017-10-15      14.64
2017-10-16  2017-10-16       5.11
2017-10-17  2017-10-17      12.09
2017-10-18  2017-10-18       8.47
2017-10-19  2017-10-19       6.34
2017-10-20  2017-10-20       7.68
2017-10-21  2017-10-21      13.40
2017-10-22  2017-10-22      25.53


datum_von   NEG_00_04  NEG_04_08  NEG_08_12  NEG_12_16  NEG_16_20  NEG_20_24                                                                 
2017-10-12      21.69      15.36       0.87       1.42       0.76       0.65   
2017-10-13      11.85       8.08       1.39       2.86       1.02       0.55   
2017-10-14       7.83       5.88       1.87       2.04       2.29       2.18   
2017-10-15      14.64      11.28       2.62       3.35       2.13       1.25   
2017-10-16       5.11       5.82       0.00       0.00       0.00       0.00   
2017-10-17      12.09       9.61       0.20       1.09       0.39       0.57   
2017-10-18       8.47       7.39       0.24       3.51       0.48       0.61 

Note that they have the same index but not the same name for columns (I tried to use d.columns = list_of_names also did not worked it )

they look different but they are the same (contain the same information) but I did not copy all the data.I would like to know which values are different and how much is the difference

desired output: I would like to have the difference of the two date frame values cell by cell I mean cell[0][0] from d1 - cell[0][0] from df2 all the results in a new date frame

ok: I organize and now they have somehow the same organization

dfres = res0.pivot(index='datum_von', columns='produktname')['max']

Upvotes: 1

Views: 703

Answers (3)

jezrael
jezrael

Reputation: 862691

Solution used

Better is use:

dfres = res0.pivot(index='datum_von', columns='produktname', values='max')
dfdiff=dfres.sub(finaldf)

And then:

mask = dfdiff < 0
df = dfdiff.loc[mask.any(axis=1), mask.any()]]

Also is possible filter original DataFrames, if same columns and index values as dfdiff:

dfa = dfres.loc[mask.any(axis=1), mask.any()]]
dfb = finaldf.loc[mask.any(axis=1), mask.any()]]

Upvotes: 1

May Emanuel
May Emanuel

Reputation: 133

I used this to organize my data frame to be in the same format

dfres = res0.pivot(index='datum_von', columns='produktname')['max']

Then I used this to compute the difference

dfdiff=dfres.subtract(finaldf)

Upvotes: 0

mzoll
mzoll

Reputation: 477

I agree that you should be more specific about your desired output, however it most likely go along these lines

 cols_overlap = [ c for c in df1.columns if c in df2.columns]
 dfn = df1.merge(df2, how='inner', left_index=True, right_index=True, suffixes=('_df1','_df2'))

 for c in cols_overlap:
      print('%s' % c)
      print((dfm[c+'_df1']==dfm[c+'_df2'] ).all())

Upvotes: 0

Related Questions