Reputation: 749
I'm having an issues with pandas that I'm a little baffled on. I have a file with a lot of numeric values that do not need calculations. Most of them are coming out just fine, but I have a couple that are getting ".0" appended to the end.
Here is a sample input file:
Id1 Id2 Age Id3
"SN19602","1013743", "24", "23523"
"SN20077","2567897", "28", "24687"
And the output being generated:
Id1 Id2 Age Id3
"SN19602","1013743.0", "24", "23523"
"SN20077","2567897.0", "28", "24687"
Can anyone explain why some but not all of the numeric values are getting the .0 appended, and if there is any way I can prevent it? It is a problem when I perform the next step of my process with the CSV output.
I have tried to convert the data frame and the column itself to a string but it did not make an impact. Ideally I do not want to list each column to convert because a have a very large number of columns and would manually have to go through the output file to figure out which ones are getting the .0 appended and code for it. Any suggestions appreciated.
import pandas as pd
import csv
df_inputFile = pd.read_csv("InputFile.csv")
df_mappingFile = pd.read_csv("MappingFile.csv")
df_merged = df_inputFile.merge(df_mappingFile, left_on="Id", right_on="Id", how="left")
#This isn't affecting the output
df_merged.astype(str)
df_merged.to_csv("Output.csv", index=False, quoting=csv.QUOTE_ALL)
Upvotes: 6
Views: 8374
Reputation: 53
I have recently faced this issue. In my case, the column similar to the Id2
column in question had an empty cell that Pandas interpreted as nan
. All the other cells of that column had trailing .0
.
Reading the file with keep_default_na=False
helps to avoid those trailing .0
.
my_df = pd.read_csv("data.csv", keep_default_na=False)
P.S: I know this answer is instead a late one, but this worked for me without enforcing data types while reading the data or having to float format.
Upvotes: 0
Reputation: 61
Pandas may be considering the datatype of that column as float that is the reason you are getting .0 appended to the data. You can use
dtype=object
in pd.read_csv
.
df_inputFile = pd.read_csv("InputFile.csv", dtype=object) .
This will make pandas read and consider all columns as string.
Upvotes: 6
Reputation: 18446
pandas.DataFrame.to_csv
has a parameter float_format
, which takes a regular float formatting string. This should work:
df_merged.to_csv("Output.csv", index=False, quoting=csv.QUOTE_ALL, float_format='%.0f')
Upvotes: 6
Reputation: 560
I like loops. They are slow, but easy to understand. This is elegant for the logic, but also it allows different formatting/decimals for each column.
Something like:
final_out = open("Output.txt", 'w')
for index, row in df.iterrows():
print ( '{:.0f}'.format(row['A']), '{:.0f}'.format(row['B']), '{:.0f}'.format(row['C']), , sep=",", file=final_out )
I think the best/faster way to do this is with something like tabulate or pretty printer.
First convert your dataframe to an array, this is easy.
array = df.values
Then you can use something neat like tabulate.
final_out = open("Output.txt", 'w')
from tabulate import tabulate as tb
print ( tb(array, numalign="right", floatfmt=".0f"), file=final_out )
you can read up a little more on tabulate or pretty printer. Above is a contextual example to get you started.
Similar to the loop above, tabulate allows a separator which could be a comma. https://pypi.python.org/pypi/tabulate at Usage of the command line utility.
Pretty sure pretty printer can do this too and could be very well a better choice.
Both of these uses the new python printing. If you use python 2.7 you will need this nifty little statement as your first non-comment line in your script:
from __future__ import print_function
Upvotes: 0