X.G
X.G

Reputation: 111

Pandas to_csv leads to extra lines

The data frame has 906133 rows, such as:

df.shape

(906133, 24)

And I tried to save it as a csv file:

df.to_csv('df.csv',encoding='utf-8-sig',index=False)

Then read it again;

test_lines = pd.read_csv('df.csv')

However, it has now much more rows:

test_lines.shape

(16512050, 24)

After some observation, the extra lines mainly contain a series of dots (...........) or commas (,,,,,,,,,,,,,,,). If I put a sep = '\t' for both saving and reading command, the number of extra lines decreased, but still existed.

Upvotes: 3

Views: 1754

Answers (1)

Kube Kubow
Kube Kubow

Reputation: 428

I got to a similar problem, however I was constructing the csv from scratch (not importing).

My blank lines disappeared after I used these parameters:

df.to_csv('df.csv', mode='w', encoding='utf-8', index=False, line_terminator='\n')

I blame the line_terminator to be be the culprit but the index parameter was responsible also for some extra separators. I hope this helps also on your side. As @Vishnudev wrote we do not have your dataset so we cannot test. If you submit, we can confirm.

Upvotes: 1

Related Questions