Reputation: 57
I am using pandas and have imported two csv.
df1 is
df2 is
The data type of df2 is
When i am doing some manipulation on df1 and df2 :
df3= pd.merge(df1, df2, how='left', on=['Origin City Code', 'DC'])
and then export it to csv
df3.to_csv("test.CSV")
then the sum of all the values under column "Volume" is NOT matching with sum of the values under columns of original df2. In-fact the sum in df3 is coming out to be more. I believe the issue is coming up due to floating point numbers. But is there any way to resolve it ?? I have gone through the following links but my question remains unanswered.
https://github.com/pydata/pandas/issues/2069
reading and writing csv in pandas changes cell values
Wrong decimal calculations with pandas
Here is the code files i am using:https://www.dropbox.com/s/kjpnhl7qtojes92/sample.rar?dl=0
Upvotes: 0
Views: 1848
Reputation: 807
I looked at your files, as @root was saying above, in df1 the combination of Origin City Code
and DC
are not unique. For instance, there are two records with Origin City Code
= GGN and DC
= ASA.
If you want to check it out you can run the following code:
df1[df1.duplicated(subset=['Origin City Code', 'DC'], keep=False)].sort_values(['Origin City Code', 'DC'])
Here's the head of this output:
Upvotes: 1