Reputation: 1938
I have the following snippet
import csv
data = {}
with open('data.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, quotechar=None)
count = 0
for row in spamreader:
data.update({row[0]:row[1]})
count+=1
print(count)
print(len(data))
The file data.csv
contains a total of 234611 rows and 2 columns.
The output is:
234611
52183
Now the reader is able to read all lines but unable to populate them into the data
dictionary. Any idea how to debug this issue?
Also, it's worth mentioning that the csv file contains a lot of non-english characters.
Upvotes: 1
Views: 5356
Reputation: 130
It's possible that you are adding duplicate keys (row[0]
) to the dictionary. You could ensure the keys are unique by using count
or appending count
to row[0]
instead.
Upvotes: 1
Reputation: 1047
Dictionary discards (or rather rewrites) values for duplicate keys. Are you sure there are no duplicate entries in the csv file?
If you want to collect all values for a given key, use defaultdict(list)
.
import csv
from collections import defaultdict
data = defaultdict(list)
with open('data.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, quotechar=None)
count = 0
for row in spamreader:
data[row[0]].append(row[1])
count+=1
print(count)
print(len(data))
Upvotes: 5