Python - CSV reader - unable to read all lines

Question

I have the following snippet

import csv

data = {}
with open('data.csv', 'rb') as csvfile:
    spamreader = csv.reader(csvfile, quotechar=None)
    count = 0
    for row in spamreader:
        data.update({row[0]:row[1]})
        count+=1
        

print(count)
print(len(data))

The file data.csv contains a total of 234611 rows and 2 columns.

The output is:

234611

52183

Now the reader is able to read all lines but unable to populate them into the data dictionary. Any idea how to debug this issue? Also, it's worth mentioning that the csv file contains a lot of non-english characters.

damisan · Accepted Answer

Dictionary discards (or rather rewrites) values for duplicate keys. Are you sure there are no duplicate entries in the csv file?

If you want to collect all values for a given key, use defaultdict(list).

import csv
from collections import defaultdict

data = defaultdict(list)
with open('data.csv', 'rb') as csvfile:
    spamreader = csv.reader(csvfile, quotechar=None)
    count = 0
    for row in spamreader:
        data[row[0]].append(row[1])
        count+=1


print(count)
print(len(data))

Python - CSV reader - unable to read all lines

Answers (2)

Related Questions