user3647558
user3647558

Reputation: 75

python csv, delete all commas but the first and last per line

This should be simple but... I have a botched up csv with commas used within fields. Fortunately though, this csv only has three columns and the surplus commas are all in the middle column - so if I would manage to delete all comma but the first and the last in each line, I should be fine. How would I get csv reader to do this?

Upvotes: 0

Views: 291

Answers (2)

Pierre D
Pierre D

Reputation: 26211

Sometimes, you want a pass-through solution that fixes files on-the-fly while reading, without generating "fixed" files, e.g if you want to directly read the data using e.g. pandas.read_csv(...). In that case, you can do this:

def fix_commas(csv_file):
    with open(csv_file) as f:
        buf = f.read()
    buf = '\n'.join([re.sub(r',,+', ',', s) for s in buf.splitlines()])
    return io.StringIO(buf)

# and then
df = pd.read_csv(fix_commas(filename), ...)

Example:

txt = """
first,second,third
a,,b,bbbb
c,,,,,d,,,,,,,e
f,g,h
"""

with open('test.csv', 'w') as f:
    f.write(txt)

# now test:
df = pd.read_csv(fix_commas('test.csv'))

Result (in df):

  first second third
0     a      b  bbbb
1     c      d     e
2     f      g     h

Upvotes: 0

buran
buran

Reputation: 14233

with open('bad.csv') as f, open('good.csv', 'w') as fout:
    for line in f:
        first, *middle, last = line.split(',')
        fout.write(f'{first},"{",".join(middle)}",{last}')

Upvotes: 3

Related Questions