314mip
314mip

Reputation: 403

Import csv with inconsistent count of columns per row with original header use pandas

please how can I read csv of that type and keep original columns names? Maybe add some generic column names to the end of the header, depending on the max number of columns in the body of csv...

a,b,c
1,2,3
1,2,3,
1,2,3,4

Simple read_csv does not work:

tempfile = pd.read_csv(path 
                 ,index_col=None
                 ,sep=','
                 ,header=0
                 ,error_bad_lines=False
                 ,encoding = 'unicode_escape'
                 ,warn_bad_lines=True
                 )
b'Skipping line 3: expected 3 fields, saw 4\nSkipping line 4: expected 3 fields, saw 4\n'

I need that type of result:

a,b,c,x1
1,2,3,NA
1,2,3,NA
1,2,3,4

Upvotes: 1

Views: 1187

Answers (1)

Martin Evans
Martin Evans

Reputation: 46759

One approach would be to first read just the header row in and then pass these column names with your extra generic names as a parameter to pandas. For example:

import pandas as pd
import csv

filename = "input.csv"

with open(filename, newline="") as f_input:
    header = next(csv.reader(f_input))

header += [f'x{n}' for n in range(1, 10)]

tempfile = pd.read_csv(filename,
                 index_col=None,
                 sep=',',
                 skiprows=1,
                 names=header,
                 error_bad_lines=False,
                 encoding='unicode_escape',
                 warn_bad_lines=True,
                 )

skiprows=1 tells pandas to jump over the header and names holds the full list of column headers to use.

The header would then contain:

['a', 'b', 'c', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9']

Upvotes: 3

Related Questions