How to prevent dtype change when writing and loading csvs of numerical strings in pandas

Question

I am trying to write numerical strings to a csv and reading it back in as a dataframe later on. However pandas automatically converts my strings on reading from object type to int64 type.

df = pandas.DataFrame({'col1':['00123','00125']}) 
print(df['col1'].dtype) 
df.to_csv('test.csv',index=False)
new_df = pandas.read_csv('test.csv') 
print(new_df['col1'].dtype)

object #value of first print
int64 #value of second print

How do I, either preserve the dtype on write or prevent the change on read?

EDIT: I noticed that if I use astype('|S') on df new_df will now be an object type. even though df.dtype does not change. This does not seem intuitive to me. If anyone can explain this to me I would appreciate it.

df = pandas.DataFrame({'col1':['00123','00125']}) 
df['col1']=df['col1'].astype('|S')  
print(df['col1'].dtype) 
df.to_csv('test.csv',index=False) 
new_df = pandas.read_csv('test.csv') 
print(new_df['col1'].dtype)

object #value of first print
object #value of second print

BENY · Accepted Answer

I will recommend write those type df to excel

df.to_excel('test.xlsx',index=False)

Or pass the columns type while you reading the file

pd.read_csv('test.csv',dtype = {'col1': object})
Out[346]: 
    col1
0  00123
1  00125

How to prevent dtype change when writing and loading csvs of numerical strings in pandas

Answers (1)

Related Questions