Reputation: 167
As untitled, I noticed that pandas 'to_csv' transforms automatically columns where there are only alphanumerical strings to float .
I am creating a dataframe in Jupyter notebook and creating a column ['A'] full of values '1'. Hence, I have a dataframe composed of a column of string '1'.
When i convert my dataframe to csv file with 'to_csv'. the output csv file is a one column full of integers 1.
You may advise me to reconvert the column to string when reloaded in jupyter, However that's won't work because I don't know beforehand what columns may be penalized because of this behaviour.
Is there a way to avoid this strange situation.
Upvotes: 6
Views: 17088
Reputation: 307
You can set the quoting
parameter in to_csv
, take a look at this example:
a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a)
df.to_csv('test.csv', sep='\t', quoting=csv.QUOTE_NONNUMERIC)
The created csv file is:
"" 0 1 2
0 "a" "1.2" "4.2"
1 "b" "70" "0.03"
2 "x" "5" "0"
You can also set the quote character with quotechar
parameter, e.g. quotechar="'"
will produce this output:
'' 0 1 2
0 'a' '1.2' '4.2'
1 'b' '70' '0.03'
2 'x' '5' '0'
Upvotes: 6
Reputation: 164623
One way is to store your types separately and load this with your data:
df = pd.DataFrame({0: ['1', '1', '1'],
1: [2, 3, 4]})
df.dtypes.to_frame('types').to_csv('types.csv')
df.to_csv('file.csv', index=False)
df_types = pd.read_csv('types.csv')['types']
df = pd.read_csv('file.csv', dtype=df_types.to_dict())
print(df.dtypes)
# 0 object
# 1 int64
# dtype: object
You may wish to consider Pickle to ensure your dataframe is guaranteed to be unchanged:
df.to_pickle('file.pkl')
df = pd.read_pickle('file.pkl')
print(df.dtypes)
# 0 object
# 1 int64
# dtype: object
Upvotes: 5