Reputation: 133
hello people (I am new to python) Question: how can I compare two DataFrames that have the same index but different formats.
Here are two examples of the DataFrames:
datum_von datum_bis NEG_00_04
2017-10-12 2017-10-12 21.69
2017-10-13 2017-10-13 11.85
2017-10-14 2017-10-14 7.83
2017-10-15 2017-10-15 14.64
2017-10-16 2017-10-16 5.11
2017-10-17 2017-10-17 12.09
2017-10-18 2017-10-18 8.47
2017-10-19 2017-10-19 6.34
2017-10-20 2017-10-20 7.68
2017-10-21 2017-10-21 13.40
2017-10-22 2017-10-22 25.53
datum_von NEG_00_04 NEG_04_08 NEG_08_12 NEG_12_16 NEG_16_20 NEG_20_24
2017-10-12 21.69 15.36 0.87 1.42 0.76 0.65
2017-10-13 11.85 8.08 1.39 2.86 1.02 0.55
2017-10-14 7.83 5.88 1.87 2.04 2.29 2.18
2017-10-15 14.64 11.28 2.62 3.35 2.13 1.25
2017-10-16 5.11 5.82 0.00 0.00 0.00 0.00
2017-10-17 12.09 9.61 0.20 1.09 0.39 0.57
2017-10-18 8.47 7.39 0.24 3.51 0.48 0.61
Note that they have the same index but not the same name for columns (I tried to use d.columns = list_of_names also did not worked it )
they look different but they are the same (contain the same information) but I did not copy all the data.I would like to know which values are different and how much is the difference
desired output: I would like to have the difference of the two date frame values cell by cell I mean cell[0][0] from d1 - cell[0][0] from df2 all the results in a new date frame
ok: I organize and now they have somehow the same organization
dfres = res0.pivot(index='datum_von', columns='produktname')['max']
Upvotes: 1
Views: 703
Reputation: 862691
Solution used
Better is use:
dfres = res0.pivot(index='datum_von', columns='produktname', values='max')
dfdiff=dfres.sub(finaldf)
And then:
mask = dfdiff < 0
df = dfdiff.loc[mask.any(axis=1), mask.any()]]
Also is possible filter original DataFrames, if same columns and index values as dfdiff
:
dfa = dfres.loc[mask.any(axis=1), mask.any()]]
dfb = finaldf.loc[mask.any(axis=1), mask.any()]]
Upvotes: 1
Reputation: 133
I used this to organize my data frame to be in the same format
dfres = res0.pivot(index='datum_von', columns='produktname')['max']
Then I used this to compute the difference
dfdiff=dfres.subtract(finaldf)
Upvotes: 0
Reputation: 477
I agree that you should be more specific about your desired output, however it most likely go along these lines
cols_overlap = [ c for c in df1.columns if c in df2.columns]
dfn = df1.merge(df2, how='inner', left_index=True, right_index=True, suffixes=('_df1','_df2'))
for c in cols_overlap:
print('%s' % c)
print((dfm[c+'_df1']==dfm[c+'_df2'] ).all())
Upvotes: 0