Serge Kashlik
Serge Kashlik

Reputation: 413

Line truncates when exporting pandas dataframe to csv

I am trying to export a dataframe to a csv file so I can upload in into SAS later, however one of lines gets truncated even though it does not reach the csv cell limit of 32k characters. The code below demonstrates the problem

import pandas as pd
import numpy as np

bin1 = np.array(['finance'])
bin2 = np.array(['other', 'metallurgy', 'car trade/manuf', 'real_estate', 'transport', 'construction'])
bin3 = np.array(['trade whl', 'trade ret', 'tourism', 'food'])

data = {'var':'emp_sector','bin':[bin1,bin2,bin3]}
df = pd.DataFrame(data)
print(df)


          var                                                bin
0  emp_sector                                          [finance]
1  emp_sector  [other, metallurgy, car trade/manuf, real_esta...
2  emp_sector              [trade whl, trade ret, tourism, food]

path = 'Y:/path/test.csv'
df.to_csv(path, encoding='ANSI')

After exporting the df I open the csv file and see this:

,var,bin
0,emp_sector,['finance']
1,emp_sector,"['other' 'metallurgy' 'car trade/manuf' 'real_estate' 'transport'
 'construction']"
2,emp_sector,['trade whl' 'trade ret' 'tourism' 'food']

For some reason 'construction' is moved to the next line. Exporting to .txt gives the same results.

Can anyone help please?

Upvotes: 2

Views: 1082

Answers (1)

bas
bas

Reputation: 15462

I think I found the culprit. If we look at the string representation of your arrays there's a problem:

>>> bin3.__str__()
"['trade whl' 'trade ret' 'tourism' 'food']"

>>> bin2.__str__()
"['other' 'metallurgy' 'car trade/manuf' 'real_estate' 'transport'\n 'construction']"

We see a newline character (\n) in the output of bin2.__str__(), which would explain why to_csv adds a newline in its output.

I found that if we first convert to a list the newline character disappears:

>>> bin2.tolist().__str__()
"['other', 'metallurgy', 'car trade/manuf', 'real_estate', 'transport', 'construction']"

So a solution could be to convert your bins from arrays to lists before you call to_csv.

Upvotes: 1

Related Questions