Reputation: 75
This should be simple but... I have a botched up csv with commas used within fields. Fortunately though, this csv only has three columns and the surplus commas are all in the middle column - so if I would manage to delete all comma but the first and the last in each line, I should be fine. How would I get csv reader to do this?
Upvotes: 0
Views: 291
Reputation: 26211
Sometimes, you want a pass-through solution that fixes files on-the-fly while reading, without generating "fixed" files, e.g if you want to directly read the data using e.g. pandas.read_csv(...)
. In that case, you can do this:
def fix_commas(csv_file):
with open(csv_file) as f:
buf = f.read()
buf = '\n'.join([re.sub(r',,+', ',', s) for s in buf.splitlines()])
return io.StringIO(buf)
# and then
df = pd.read_csv(fix_commas(filename), ...)
Example:
txt = """
first,second,third
a,,b,bbbb
c,,,,,d,,,,,,,e
f,g,h
"""
with open('test.csv', 'w') as f:
f.write(txt)
# now test:
df = pd.read_csv(fix_commas('test.csv'))
Result (in df
):
first second third
0 a b bbbb
1 c d e
2 f g h
Upvotes: 0
Reputation: 14233
with open('bad.csv') as f, open('good.csv', 'w') as fout:
for line in f:
first, *middle, last = line.split(',')
fout.write(f'{first},"{",".join(middle)}",{last}')
Upvotes: 3