Zuenie
Zuenie

Reputation: 973

Remove last column from large CSV file

I have a large CSV file that I would like to upload to postgres using datagrip. When I upload it I get the following error for a few records:

11807:98: actual: value separator (,), expected: record separator (\n) 

The problem is that there are only 19 columns recognized and that in a very few records there are 20 values. For those records it does not import the record at all. I do want to have all rows in my database.

I thought this would be a peace of cake in python. So I started to work with pandas and loaded the data with the following line:

 df = pd.read_csv('filename.csv', sep='delimiter', header=None)

That does contain the file. It is however way more difficult than I expected to either remove all values that are on lines behind the 19 comma, or to just add a full 20th column that I can later delete. It feels like this is a trivial question and I need a push in the right direction.

Upvotes: 0

Views: 1269

Answers (2)

Mike Organek
Mike Organek

Reputation: 12494

Reading it into a dataframe that way buys you nothing since each cell will contain a row of your file.

Use the csv module, instead, because it does not try to read the file as a table of fixed width.

import csv

with open('filename.csv', 'r') as i, open('fixed.csv', 'w') as o:
    cr = csv.reader(i)
    cw = csv.writer(o)
    for row in cr:
        cw.writerow(row[0:19])


Upvotes: 3

Rob Raymond
Rob Raymond

Reputation: 31206

This generates a dataframe with 30 columns then drops all columns after column 20.

df = pd.DataFrame({f"col{i}":[1,2,3] for i in range(30)})

df = df.drop(columns=[c for c in df.columns[20:]])

Upvotes: 3

Related Questions