Reputation: 413
I am trying to export a dataframe to a csv file so I can upload in into SAS later, however one of lines gets truncated even though it does not reach the csv cell limit of 32k characters. The code below demonstrates the problem
import pandas as pd
import numpy as np
bin1 = np.array(['finance'])
bin2 = np.array(['other', 'metallurgy', 'car trade/manuf', 'real_estate', 'transport', 'construction'])
bin3 = np.array(['trade whl', 'trade ret', 'tourism', 'food'])
data = {'var':'emp_sector','bin':[bin1,bin2,bin3]}
df = pd.DataFrame(data)
print(df)
var bin
0 emp_sector [finance]
1 emp_sector [other, metallurgy, car trade/manuf, real_esta...
2 emp_sector [trade whl, trade ret, tourism, food]
path = 'Y:/path/test.csv'
df.to_csv(path, encoding='ANSI')
After exporting the df I open the csv file and see this:
,var,bin
0,emp_sector,['finance']
1,emp_sector,"['other' 'metallurgy' 'car trade/manuf' 'real_estate' 'transport'
'construction']"
2,emp_sector,['trade whl' 'trade ret' 'tourism' 'food']
For some reason 'construction' is moved to the next line. Exporting to .txt
gives the same results.
Can anyone help please?
Upvotes: 2
Views: 1082
Reputation: 15462
I think I found the culprit. If we look at the string representation of your arrays there's a problem:
>>> bin3.__str__()
"['trade whl' 'trade ret' 'tourism' 'food']"
>>> bin2.__str__()
"['other' 'metallurgy' 'car trade/manuf' 'real_estate' 'transport'\n 'construction']"
We see a newline character (\n
) in the output of bin2.__str__()
, which would explain why to_csv
adds a newline in its output.
I found that if we first convert to a list the newline character disappears:
>>> bin2.tolist().__str__()
"['other', 'metallurgy', 'car trade/manuf', 'real_estate', 'transport', 'construction']"
So a solution could be to convert your bins from arrays to lists before you call to_csv
.
Upvotes: 1