mouni93
mouni93

Reputation: 167

pandas to_csv converts str column to int(or float)

As untitled, I noticed that pandas 'to_csv' transforms automatically columns where there are only alphanumerical strings to float . I am creating a dataframe in Jupyter notebook and creating a column ['A'] full of values '1'. Hence, I have a dataframe composed of a column of string '1'. When i convert my dataframe to csv file with 'to_csv'. the output csv file is a one column full of integers 1.
You may advise me to reconvert the column to string when reloaded in jupyter, However that's won't work because I don't know beforehand what columns may be penalized because of this behaviour. Is there a way to avoid this strange situation.

Upvotes: 6

Views: 17088

Answers (2)

farkas
farkas

Reputation: 307

You can set the quoting parameter in to_csv, take a look at this example:

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a)
df.to_csv('test.csv', sep='\t', quoting=csv.QUOTE_NONNUMERIC)

The created csv file is:

""  0   1   2
0   "a" "1.2"   "4.2"
1   "b" "70"    "0.03"
2   "x" "5" "0"

You can also set the quote character with quotechar parameter, e.g. quotechar="'" will produce this output:

''  0   1   2
0   'a' '1.2'   '4.2'
1   'b' '70'    '0.03'
2   'x' '5' '0'

Upvotes: 6

jpp
jpp

Reputation: 164623

One way is to store your types separately and load this with your data:

df = pd.DataFrame({0: ['1', '1', '1'],
                   1: [2, 3, 4]})

df.dtypes.to_frame('types').to_csv('types.csv')
df.to_csv('file.csv', index=False)

df_types = pd.read_csv('types.csv')['types']
df = pd.read_csv('file.csv', dtype=df_types.to_dict())

print(df.dtypes)
# 0    object
# 1     int64
# dtype: object

You may wish to consider Pickle to ensure your dataframe is guaranteed to be unchanged:

df.to_pickle('file.pkl')
df = pd.read_pickle('file.pkl')

print(df.dtypes)
# 0    object
# 1     int64
# dtype: object

Upvotes: 5

Related Questions