user6883405
user6883405

Reputation: 403

Reading CSV with Separator in column values

I have a CSV saved as data.csv that looks like this, with two columns:

Column1|Column2
Titleone|1.5
Title|two|2.5
Title3|3.6

The third row of data in the CSV contains a pipe operator, | that is causing the error. I need a way to read in the pipe operator as part of the Column1 value for the third row. When I run pd.read_csv("data.csv", sep = "|") I get the error: ParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 3

I cannot use, on_bad_lines='skip' since I'm on an old version of Pandas. This is a workaround I found that seems to be a partial solution:

col_names = ["col1", "col2", "col3"]
df = pd.read_csv("data.csv", sep = "|", names = col_names)

Upvotes: 2

Views: 845

Answers (1)

Always Right Never Left
Always Right Never Left

Reputation: 1481

on_bad_lines deprecates error_bad_lines, so if you're on an older version of pandas, you can just use that:

pd.read_csv("data.csv", sep = "|", error_bad_lines = False)

If you want to keep bad lines, you can also use warn_bad_lines, extract bad lines from the warnings and read them separately in a single column:

import contextlib

with open('log.txt', 'w') as log:
    with contextlib.redirect_stderr(log):
        df = pd.read_csv('data.csv', sep = '|', error_bad_lines = False, warn_bad_lines = True)

with open('log.txt') as f:
    f = f.readlines()

bad_lines = [int(x[0]) - 1 for x in f[0].split('line ')[1:]]

df_bad_lines = pd.read_csv('data.csv', skiprows = lambda x: x not in bad_lines, squeeze = True, header = None)

Upvotes: 3

Related Questions