Data
Data

Reputation: 769

Ignore delimiters at end of row in Pandas read csv

I have data in CSV files. I am separating the data into columns using a single tab character. Most of the rows just contain one tab character, like this:

A\tB

Some rows contain extra tabs at the end of the row, like this:

A\tB\t\t

Hence, if I do pd.read_csv(filePath, sep='\t'), then I get an error: ParserError: Error tokenizing data. c error: Expected 2 fields in line XXX, saw 4. That's because some rows contain 4 tabs.

So how can I ignore the tabs at the end of a row, if it contains extra tabs?

Upvotes: 0

Views: 1408

Answers (1)

Corralien
Corralien

Reputation: 120391

Use io.StringIO to clean file before:

import pandas as pd
import io

with open('data.txt') as table:
    buffer = io.StringIO('\n'.join(line.strip() for line in table))
    df = pd.read_table(buffer, header=None)

Output:

>>> df
   0  1
0  A  B
1  A  B

Upvotes: 2

Related Questions