Martin Pszczola
Martin Pszczola

Reputation: 1

Identifying and removing extra tabs in a tab delimited text file

I have a text files with 120 columns and thousands of rows where the delimiter is a tab. In some rows, there is an extra tab present making it seem, in that row, like there are 121 columns. The location of this extra tab is not known to be the same for all the text files.

I am wondering if anyone has any thoughts on efficiently locating the extra tab and removing it programmatically.

Upvotes: 0

Views: 601

Answers (1)

mozway
mozway

Reputation: 260430

You can use a regex as separator in read_csv.

Use '\t+' (one or more tabulations):

df = pd.read_csv('your_file.csv', sep='\t+', engine='python')

Upvotes: 2

Related Questions