Reputation: 973
I have a large CSV file that I would like to upload to postgres using datagrip. When I upload it I get the following error for a few records:
11807:98: actual: value separator (,), expected: record separator (\n)
The problem is that there are only 19 columns recognized and that in a very few records there are 20 values. For those records it does not import the record at all. I do want to have all rows in my database.
I thought this would be a peace of cake in python. So I started to work with pandas and loaded the data with the following line:
df = pd.read_csv('filename.csv', sep='delimiter', header=None)
That does contain the file. It is however way more difficult than I expected to either remove all values that are on lines behind the 19 comma, or to just add a full 20th column that I can later delete. It feels like this is a trivial question and I need a push in the right direction.
Upvotes: 0
Views: 1269
Reputation: 12494
Reading it into a dataframe that way buys you nothing since each cell will contain a row of your file.
Use the csv
module, instead, because it does not try to read the file as a table of fixed width.
import csv
with open('filename.csv', 'r') as i, open('fixed.csv', 'w') as o:
cr = csv.reader(i)
cw = csv.writer(o)
for row in cr:
cw.writerow(row[0:19])
Upvotes: 3
Reputation: 31206
This generates a dataframe with 30 columns then drops all columns after column 20.
df = pd.DataFrame({f"col{i}":[1,2,3] for i in range(30)})
df = df.drop(columns=[c for c in df.columns[20:]])
Upvotes: 3