Guillermo Gruschka
Guillermo Gruschka

Reputation: 167

Best way to count unique values from CSV in Python?

I need a quick way of counting unique values from a CSV (its a really big file (>100mb) that can't be opened in Excel for example) and I thought of creating a python script.

The CSV looks like this:

431231
3412123
321231
1234321
12312431
634534

I just need the script to return how many different values are in the file. E.g. for above the desired output would be:

6

So far this is what I have:

import csv
input_file = open(r'C:\Users\guill\Downloads\uu.csv')
csv_reader = csv.reader(input_file, delimiter=',')
thisdict = {
  "UserId": 1
}

for row in csv_reader:
    if row[0] not in thisdict:
        thisdict[row[0]] = 1

print(len(thisdict)-1)

Seems to be working fine, but I wonder if there's a better/more efficient/elegant way to do this?

Upvotes: 0

Views: 2523

Answers (2)

jackie zhong
jackie zhong

Reputation: 1

use a set instead of a dict, just like this

import csv
input_file = open(r'C:\Users\guill\Downloads\uu.csv')
csv_reader = csv.reader(input_file, delimiter=',')
aa = set()
for row in csv_reader:
    aa.add(row[0])
print(len(aa))

Upvotes: 0

James Shapiro
James Shapiro

Reputation: 6196

A set is more tailor-made for this problem than a dictionary:

with open(r'C:\Users\guill\Downloads\uu.csv') as f:
    input_file = f

csv_reader = csv.reader(f, delimiter=',')
uniqueIds = set()

for row in csv_reader:
    uniqueIds.add(row[0])

print(len(uniqueIds))

Upvotes: 2

Related Questions