Reputation: 25
I'm working on a project where I have to parse a huge csv file with 500,000 rows. Below is a small portion of code as an example. It breaks up the columns fine, but it only reads 9,132 rows when I need it to go through all 500,000. The csv is encoded in cp1252, which I have a feeling might be part of the issue but I am not sure. Also here is the error I am getting:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 4123: character maps to <undefined>
Code:
import csv
outfile = open("newFile.csv", 'w')
with open("ProductFile.csv", "r") as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
Item_ID = row[0]
Sku = row[1]
SKU_ID = row[2]
altpartnum = row[3]
Application = row[4]
Brandcode = row[5]
line = "{},{},{},{},{},{},\n".format(
Item_ID, AD_SKU_ID, MemberSku, Application, Brandcode, Application, Brandcode)
outfile.write(line)
outfile.close()
Upvotes: 0
Views: 63
Reputation: 178419
CP1252 doesn't support decoding byte 0x81, so the encoding is not CP1252. It might be ISO-88591 (a.k.a latin1) but it will encoded all bytes to something so you may get mojibake:
Suggested code (but use the correct encoding if not latin1
):
import csv
with (open('ProductFile.csv', 'r', encoding='latin1', newline='') as fin,
open('newFile.csv', 'w', encoding='latin1', newline='') as fout):
reader = csv.reader(fin)
writer = csv.writer(fout)
for row in reader:
writer.writerow(row[:6]) # first 6 columns or whatever you want to write
# The OP code had undefined variables
Upvotes: 2