Reputation: 403
I have a CSV saved as data.csv that looks like this, with two columns:
Column1|Column2
Titleone|1.5
Title|two|2.5
Title3|3.6
The third row of data in the CSV contains a pipe operator, | that is causing the error. I need a way to read in the pipe operator as part of the Column1 value for the third row. When I run pd.read_csv("data.csv", sep = "|")
I get the error: ParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 3
I cannot use, on_bad_lines='skip'
since I'm on an old version of Pandas. This is a workaround I found that seems to be a partial solution:
col_names = ["col1", "col2", "col3"]
df = pd.read_csv("data.csv", sep = "|", names = col_names)
Upvotes: 2
Views: 845
Reputation: 1481
on_bad_lines
deprecates error_bad_lines
, so if you're on an older version of pandas
, you can just use that:
pd.read_csv("data.csv", sep = "|", error_bad_lines = False)
If you want to keep bad lines, you can also use warn_bad_lines
, extract bad lines from the warnings and read them separately in a single column:
import contextlib
with open('log.txt', 'w') as log:
with contextlib.redirect_stderr(log):
df = pd.read_csv('data.csv', sep = '|', error_bad_lines = False, warn_bad_lines = True)
with open('log.txt') as f:
f = f.readlines()
bad_lines = [int(x[0]) - 1 for x in f[0].split('line ')[1:]]
df_bad_lines = pd.read_csv('data.csv', skiprows = lambda x: x not in bad_lines, squeeze = True, header = None)
Upvotes: 3