Remove last column from large CSV file

Question

I have a large CSV file that I would like to upload to postgres using datagrip. When I upload it I get the following error for a few records:

11807:98: actual: value separator (,), expected: record separator (
)

The problem is that there are only 19 columns recognized and that in a very few records there are 20 values. For those records it does not import the record at all. I do want to have all rows in my database.

I thought this would be a peace of cake in python. So I started to work with pandas and loaded the data with the following line:

 df = pd.read_csv('filename.csv', sep='delimiter', header=None)

That does contain the file. It is however way more difficult than I expected to either remove all values that are on lines behind the 19 comma, or to just add a full 20th column that I can later delete. It feels like this is a trivial question and I need a push in the right direction.

Mike Organek · Accepted Answer

Reading it into a dataframe that way buys you nothing since each cell will contain a row of your file.

Use the csv module, instead, because it does not try to read the file as a table of fixed width.

import csv

with open('filename.csv', 'r') as i, open('fixed.csv', 'w') as o:
    cr = csv.reader(i)
    cw = csv.writer(o)
    for row in cr:
        cw.writerow(row[0:19])

Remove last column from large CSV file

Answers (2)

Related Questions