Reputation: 748
Lets say I have a (badly formed) csv like this:
header1, header2, header3
value1, value2, value3, value4
I'd like to load this into a dataframe. However
pd.read_csv(file_data, index = False)
drops value4:
header1 | header2 | header3 |
---|---|---|
value1 | value2 | value3 |
and
pd.read_csv(file_data)
leaves me with no way to differentiate if the value of the index came from value1 in the csv file, or was autoassigned by pandas.
Is there a way to have pandas just create dummy columns on the end based on the row with the maximum number of delimiters?
Upvotes: 0
Views: 319
Reputation: 1213
If the problem is only the header, setting the header parameter
as None
will solve your problem.
pd.read_csv(file_data, header=None)
If the number of delimiters on each row is different, you need to read each line using open()
function.
with open('test.csv', 'r') as f:
df = [i.strip().split(',') for i in f.readlines()]
df = pd.DataFrame(df)
print(df)
Output: (I added "1,2,3,4,5,6\n"
and "11,22,33\n"
after the last row)
0 1 2 3 4 5
0 header1 header2 header3 None None
1 value1 value2 value3 value4 None None
2 1 2 3 4 5 6
3 11 12 13 None None None
Upvotes: 1