Reputation: 167
I need a quick way of counting unique values from a CSV (its a really big file (>100mb) that can't be opened in Excel for example) and I thought of creating a python script.
The CSV looks like this:
431231
3412123
321231
1234321
12312431
634534
I just need the script to return how many different values are in the file. E.g. for above the desired output would be:
6
So far this is what I have:
import csv
input_file = open(r'C:\Users\guill\Downloads\uu.csv')
csv_reader = csv.reader(input_file, delimiter=',')
thisdict = {
"UserId": 1
}
for row in csv_reader:
if row[0] not in thisdict:
thisdict[row[0]] = 1
print(len(thisdict)-1)
Seems to be working fine, but I wonder if there's a better/more efficient/elegant way to do this?
Upvotes: 0
Views: 2523
Reputation: 1
use a set instead of a dict, just like this
import csv
input_file = open(r'C:\Users\guill\Downloads\uu.csv')
csv_reader = csv.reader(input_file, delimiter=',')
aa = set()
for row in csv_reader:
aa.add(row[0])
print(len(aa))
Upvotes: 0
Reputation: 6196
A set is more tailor-made for this problem than a dictionary:
with open(r'C:\Users\guill\Downloads\uu.csv') as f:
input_file = f
csv_reader = csv.reader(f, delimiter=',')
uniqueIds = set()
for row in csv_reader:
uniqueIds.add(row[0])
print(len(uniqueIds))
Upvotes: 2