Asym
Asym

Reputation: 1938

Python - CSV reader - unable to read all lines

I have the following snippet

import csv

data = {}
with open('data.csv', 'rb') as csvfile:
    spamreader = csv.reader(csvfile, quotechar=None)
    count = 0
    for row in spamreader:
        data.update({row[0]:row[1]})
        count+=1
        

print(count)
print(len(data))

The file data.csv contains a total of 234611 rows and 2 columns.

The output is:

234611

52183

Now the reader is able to read all lines but unable to populate them into the data dictionary. Any idea how to debug this issue? Also, it's worth mentioning that the csv file contains a lot of non-english characters.

Upvotes: 1

Views: 5356

Answers (2)

jem
jem

Reputation: 130

It's possible that you are adding duplicate keys (row[0]) to the dictionary. You could ensure the keys are unique by using count or appending count to row[0] instead.

Upvotes: 1

damisan
damisan

Reputation: 1047

Dictionary discards (or rather rewrites) values for duplicate keys. Are you sure there are no duplicate entries in the csv file?

If you want to collect all values for a given key, use defaultdict(list).

import csv
from collections import defaultdict

data = defaultdict(list)
with open('data.csv', 'rb') as csvfile:
    spamreader = csv.reader(csvfile, quotechar=None)
    count = 0
    for row in spamreader:
        data[row[0]].append(row[1])
        count+=1


print(count)
print(len(data))

Upvotes: 5

Related Questions