Import csv with inconsistent count of columns per row with original header use pandas

Question

please how can I read csv of that type and keep original columns names? Maybe add some generic column names to the end of the header, depending on the max number of columns in the body of csv...

a,b,c
1,2,3
1,2,3,
1,2,3,4

Simple read_csv does not work:

tempfile = pd.read_csv(path 
                 ,index_col=None
                 ,sep=','
                 ,header=0
                 ,error_bad_lines=False
                 ,encoding = 'unicode_escape'
                 ,warn_bad_lines=True
                 )

b'Skipping line 3: expected 3 fields, saw 4
Skipping line 4: expected 3 fields, saw 4
'

I need that type of result:

a,b,c,x1
1,2,3,NA
1,2,3,NA
1,2,3,4

Martin Evans · Accepted Answer

One approach would be to first read just the header row in and then pass these column names with your extra generic names as a parameter to pandas. For example:

import pandas as pd
import csv

filename = "input.csv"

with open(filename, newline="") as f_input:
    header = next(csv.reader(f_input))

header += [f'x{n}' for n in range(1, 10)]

tempfile = pd.read_csv(filename,
                 index_col=None,
                 sep=',',
                 skiprows=1,
                 names=header,
                 error_bad_lines=False,
                 encoding='unicode_escape',
                 warn_bad_lines=True,
                 )

skiprows=1 tells pandas to jump over the header and names holds the full list of column headers to use.

The header would then contain:

['a', 'b', 'c', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9']

Import csv with inconsistent count of columns per row with original header use pandas

Answers (1)

Related Questions