Reputation: 769
I have data in CSV files. I am separating the data into columns using a single tab character. Most of the rows just contain one tab character, like this:
A\tB
Some rows contain extra tabs at the end of the row, like this:
A\tB\t\t
Hence, if I do pd.read_csv(filePath, sep='\t')
, then I get an error: ParserError: Error tokenizing data. c error: Expected 2 fields in line XXX, saw 4
. That's because some rows contain 4 tabs.
So how can I ignore the tabs at the end of a row, if it contains extra tabs?
Upvotes: 0
Views: 1408
Reputation: 120391
Use io.StringIO
to clean file before:
import pandas as pd
import io
with open('data.txt') as table:
buffer = io.StringIO('\n'.join(line.strip() for line in table))
df = pd.read_table(buffer, header=None)
Output:
>>> df
0 1
0 A B
1 A B
Upvotes: 2