Reputation: 4144
I have the code below to parse some csv data. The key is the last few lines though, the rest is only there to show the context. Basically, there are three columns in my data in the end, the ID variable LopNr and year should have integers "anyway" but I convert the entire DataFrame to integer just in case. Why do I get ".0" for the LopNr and year columns in the resulting csv file, while the third column with aggregated data actually is converted to integers and is output without ".0"? I would have thought that after .astype(int)
all columns will have integers, and our exported to csv without converting them back to floats.
import iopro
from pandas import *
neuro = DataFrame()
for year in xrange(2005,2012):
for month in xrange(1,13):
if year == 2005 and month < 7:
continue
filename = 'Q:\\drugs\\lmed_' + str(year) + '_mon'+ str(month) +'.txt'
adapter = iopro.text_adapter(filename,parser='csv',field_names=True,output='dataframe',delimiter='\t')
monthly = adapter[['LopNr','ATC','TKOST']][:]
monthly['year']=year
neuro = neuro.append(monthly[(monthly.ATC.str.startswith('N')) & (~(monthly.TKOST.isnull()))])
neuro = neuro.groupby(['LopNr','year']).sum()
neuro = neuro.astype(int)
neuro.to_csv('Q:\\drugs\\annual_neuro_costs.csv')
Upvotes: 6
Views: 6192
Reputation: 19932
This is probably because your 'LopNr' and 'year' columns have null values. At present, pandas does not support integer columns with null values and instead upconverts the entire column to float.
Edit:
As of version 0.24.0, there is preliminary support in Pandas for nullable integer data type.
By default, integers still get converted to floats if there are missing values:
>> df = pd.DataFrame([[1, 2, None], [5, None, 7]])
>> print(df)
0 1 2
0 1 2.0 NaN
1 5 NaN 7.0
However, if we specify dtype="Int64"
, this no longer happens:
>> df = pd.DataFrame([[1, 2, None], [5, None, 7]], dtype="Int64")
>> print(df)
0 1 2
0 1 2 <NA>
1 5 <NA> 7
Upvotes: 5