Pandas read_csv create columns based on maxmimum number of delimiters in a row

Question

Lets say I have a (badly formed) csv like this:

header1, header2, header3
value1, value2, value3, value4

I'd like to load this into a dataframe. However

pd.read_csv(file_data, index = False)

drops value4:

header1	header2	header3
value1	value2	value3

and

pd.read_csv(file_data)

leaves me with no way to differentiate if the value of the index came from value1 in the csv file, or was autoassigned by pandas.

Is there a way to have pandas just create dummy columns on the end based on the row with the maximum number of delimiters?

Gusti Adli · Accepted Answer

If the problem is only the header, setting the header parameter as None will solve your problem.

pd.read_csv(file_data, header=None)

If the number of delimiters on each row is different, you need to read each line using open() function.

with open('test.csv', 'r') as f:
    df = [i.strip().split(',') for i in f.readlines()]
    
df = pd.DataFrame(df)
print(df)

Output: (I added "1,2,3,4,5,6 " and "11,22,33 " after the last row)

         0        1        2       3     4     5
0  header1  header2  header3          None  None
1   value1   value2   value3  value4  None  None
2        1        2        3       4     5     6
3       11       12       13    None  None  None

Pandas read_csv create columns based on maxmimum number of delimiters in a row

Answers (1)

Related Questions