Best way to count unique values from CSV in Python?

Question

I need a quick way of counting unique values from a CSV (its a really big file (>100mb) that can't be opened in Excel for example) and I thought of creating a python script.

The CSV looks like this:

I just need the script to return how many different values are in the file. E.g. for above the desired output would be:

6

So far this is what I have:

import csv
input_file = open(r'C:\Users\guill\Downloads\uu.csv')
csv_reader = csv.reader(input_file, delimiter=',')
thisdict = {
  "UserId": 1
}

for row in csv_reader:
    if row[0] not in thisdict:
        thisdict[row[0]] = 1

print(len(thisdict)-1)

Seems to be working fine, but I wonder if there's a better/more efficient/elegant way to do this?

James Shapiro · Accepted Answer

A set is more tailor-made for this problem than a dictionary:

with open(r'C:\Users\guill\Downloads\uu.csv') as f:
    input_file = f

csv_reader = csv.reader(f, delimiter=',')
uniqueIds = set()

for row in csv_reader:
    uniqueIds.add(row[0])

print(len(uniqueIds))

Best way to count unique values from CSV in Python?

Answers (2)

Related Questions