Reputation: 403
please how can I read csv of that type and keep original columns names? Maybe add some generic column names to the end of the header, depending on the max number of columns in the body of csv...
a,b,c
1,2,3
1,2,3,
1,2,3,4
Simple read_csv does not work:
tempfile = pd.read_csv(path
,index_col=None
,sep=','
,header=0
,error_bad_lines=False
,encoding = 'unicode_escape'
,warn_bad_lines=True
)
b'Skipping line 3: expected 3 fields, saw 4\nSkipping line 4: expected 3 fields, saw 4\n'
I need that type of result:
a,b,c,x1
1,2,3,NA
1,2,3,NA
1,2,3,4
Upvotes: 1
Views: 1187
Reputation: 46759
One approach would be to first read just the header row in and then pass these column names with your extra generic names as a parameter to pandas. For example:
import pandas as pd
import csv
filename = "input.csv"
with open(filename, newline="") as f_input:
header = next(csv.reader(f_input))
header += [f'x{n}' for n in range(1, 10)]
tempfile = pd.read_csv(filename,
index_col=None,
sep=',',
skiprows=1,
names=header,
error_bad_lines=False,
encoding='unicode_escape',
warn_bad_lines=True,
)
skiprows=1
tells pandas to jump over the header and names
holds the full list of column headers to use.
The header would then contain:
['a', 'b', 'c', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9']
Upvotes: 3