Ajeet
Ajeet

Reputation: 57

Pandas to_csv export giving wrong values in a dataframe

I am using pandas and have imported two csv.

df1 is

enter image description here

df2 is

enter image description here

The data type of df2 is

enter image description here

When i am doing some manipulation on df1 and df2 :

df3= pd.merge(df1, df2, how='left', on=['Origin City Code', 'DC'])

and then export it to csv

df3.to_csv("test.CSV")

then the sum of all the values under column "Volume" is NOT matching with sum of the values under columns of original df2. In-fact the sum in df3 is coming out to be more. I believe the issue is coming up due to floating point numbers. But is there any way to resolve it ?? I have gone through the following links but my question remains unanswered.

https://github.com/pydata/pandas/issues/2069

float64 with pandas to_csv

reading and writing csv in pandas changes cell values

Wrong decimal calculations with pandas

Here is the code files i am using:https://www.dropbox.com/s/kjpnhl7qtojes92/sample.rar?dl=0

Upvotes: 0

Views: 1848

Answers (1)

shawnheide
shawnheide

Reputation: 807

I looked at your files, as @root was saying above, in df1 the combination of Origin City Code and DC are not unique. For instance, there are two records with Origin City Code = GGN and DC = ASA.

If you want to check it out you can run the following code:

df1[df1.duplicated(subset=['Origin City Code', 'DC'], keep=False)].sort_values(['Origin City Code', 'DC'])

Here's the head of this output:

enter image description here

Upvotes: 1

Related Questions