Grobanix
Grobanix

Reputation: 169

Pandas read_excel() line break in cell

I use pandas in python to to extract one excel sheet to a csv file:

import pandas as pd 
import sys
    
read_file = pd.read_excel(sys.argv[1], engine='openpyxl', sheet_name = sys.argv[3])
read_file.to_csv (sys.argv[2],  index = None, header=False, sep=';')

The problem is that line breaks in a cell (alt + enter) will cause a new line in the csv file. Instead it should just seperate the text with a whitespace

Upvotes: 2

Views: 4784

Answers (1)

Paul Brennan
Paul Brennan

Reputation: 2696

mystring = mystring.replace('\n', ' ').replace('\r', '') will get rid of the offenders for a string and there is a matching pandas function

so

import pandas as pd 
import sys
    
read_file = pd.read_excel(sys.argv[1], engine='openpyxl', sheet_name = sys.argv[3])
read_file = read_file.replace('\n', ' ').replace('\r', '')
read_file.to_csv (sys.argv[2],  index = None, header=False, sep=';')

will get what you want

Note, windows uses carriage return and line feed. So one gets replaced with space and the other one with nothing. Hence the difference in the above replace aguments.

Upvotes: 1

Related Questions