EMC
EMC

Reputation: 749

Pandas appending .0 to a number

I'm having an issues with pandas that I'm a little baffled on. I have a file with a lot of numeric values that do not need calculations. Most of them are coming out just fine, but I have a couple that are getting ".0" appended to the end.

Here is a sample input file:

Id1         Id2      Age     Id3
"SN19602","1013743", "24", "23523"
"SN20077","2567897", "28", "24687"

And the output being generated:

Id1         Id2      Age     Id3
"SN19602","1013743.0", "24", "23523"
"SN20077","2567897.0", "28", "24687"

Can anyone explain why some but not all of the numeric values are getting the .0 appended, and if there is any way I can prevent it? It is a problem when I perform the next step of my process with the CSV output.

I have tried to convert the data frame and the column itself to a string but it did not make an impact. Ideally I do not want to list each column to convert because a have a very large number of columns and would manually have to go through the output file to figure out which ones are getting the .0 appended and code for it. Any suggestions appreciated.

import pandas as pd
import csv

df_inputFile = pd.read_csv("InputFile.csv")
df_mappingFile = pd.read_csv("MappingFile.csv")
df_merged = df_inputFile.merge(df_mappingFile, left_on="Id", right_on="Id", how="left")
#This isn't affecting the output
df_merged.astype(str)
df_merged.to_csv("Output.csv", index=False, quoting=csv.QUOTE_ALL)

Upvotes: 6

Views: 8374

Answers (4)

Nijanthan Vijayakumar
Nijanthan Vijayakumar

Reputation: 53

I have recently faced this issue. In my case, the column similar to the Id2 column in question had an empty cell that Pandas interpreted as nan. All the other cells of that column had trailing .0.

Reading the file with keep_default_na=False helps to avoid those trailing .0.

my_df = pd.read_csv("data.csv", keep_default_na=False)

P.S: I know this answer is instead a late one, but this worked for me without enforcing data types while reading the data or having to float format.

Upvotes: 0

Sreekanth reddy
Sreekanth reddy

Reputation: 61

Pandas may be considering the datatype of that column as float that is the reason you are getting .0 appended to the data. You can use dtype=object in pd.read_csv .

df_inputFile = pd.read_csv("InputFile.csv", dtype=object) .

This will make pandas read and consider all columns as string.

Upvotes: 6

Carsten
Carsten

Reputation: 18446

pandas.DataFrame.to_csv has a parameter float_format, which takes a regular float formatting string. This should work:

df_merged.to_csv("Output.csv", index=False, quoting=csv.QUOTE_ALL, float_format='%.0f')

Upvotes: 6

PhysicalChemist
PhysicalChemist

Reputation: 560

I like loops. They are slow, but easy to understand. This is elegant for the logic, but also it allows different formatting/decimals for each column.

Something like:

final_out = open("Output.txt", 'w')

for index, row in df.iterrows():
    print ( '{:.0f}'.format(row['A']), '{:.0f}'.format(row['B']),  '{:.0f}'.format(row['C']), , sep=",", file=final_out )

I think the best/faster way to do this is with something like tabulate or pretty printer.

First convert your dataframe to an array, this is easy.

array = df.values

Then you can use something neat like tabulate.

final_out = open("Output.txt", 'w')
from tabulate import tabulate as tb
print ( tb(array, numalign="right", floatfmt=".0f"), file=final_out )

you can read up a little more on tabulate or pretty printer. Above is a contextual example to get you started.

Similar to the loop above, tabulate allows a separator which could be a comma. https://pypi.python.org/pypi/tabulate at Usage of the command line utility.

Pretty sure pretty printer can do this too and could be very well a better choice.


Both of these uses the new python printing. If you use python 2.7 you will need this nifty little statement as your first non-comment line in your script:

from __future__ import print_function

Upvotes: 0

Related Questions