Reputation: 323

Count Distinct Values in a List of Lists

I just imported the values from a .csv file to a list of lists, and now I need to know how many distinct users are there. The file itself looks like to following:

[['123', 'apple'], ['123', 'banana'], ['345', 'apple'], ['567', 'berry'], ['567', 'banana']]

Basically, I need to know how many distinct users (first value in each sub-list is a user ID) are there (3 in this case, over 6,000 after doing some Excel filtering), and what are the frequencies for the food itself: {'apple': 2, 'banana': 2, 'berry': 1}.

Here is the code I have tried to use for distinct values counts (using Python 2.7):

import csv
with open('food.csv', 'rb') as food:
    next(food)
    for line in food:
        csv_food = csv.reader(food)
        result_list = list(csv_follows)

result_distinct = list(x for l in result_list for x in l)

print len(result_distinct)

Upvotes: 6

Answers (5)

Bhaskar Gupta

Reputation: 146

A=[[0, 1],[0, 3],[1, 3],[3, 4],[3, 6],[4, 5],[4, 7],[5, 7],[6, 4]]
K = []
for _ in range(len(A)):
    K.extend(A[_])
print(set(K))

OUTPUT:

{0, 1, 3, 4, 5, 6, 7}

In python extend function extends the list instead of appending it that's what we need and then use set to print distinct values.

Upvotes: 0

timbmg

Reputation: 3338

You can use [x[0] for x in result_list] to get a list of all the ids. Then you create a set, that is all list of all unique items in that list. The length of the set will then give you the number of unique users.

len(set([x[0] for x in result_list]))

Upvotes: 3

SparkAndShine

Reputation: 18045

For the first question, use set,

import operator

lists = [['123', 'apple'], ['123', 'banana'], ['345', 'apple'], ['567', 'berry'], ['567', 'banana']]
nrof_users = len(set(map(operator.itemgetter(0), lists)))

print(nrof_users)
# 3

For the second question, use collections.Counter,

import collections
import operator

result = collections.Counter(map(operator.itemgetter(1), lists))

print(result)
# Counter({'apple': 2, 'banana': 2, 'berry': 1})

Upvotes: 0

Lucas

Reputation: 4097

To get the distinct users, you can use a set:

result_distinct = len({x[0] for x in result_list})

And the frequencies, you can use collections.Counter:

freqs = collections.Counter([x[1] for x in result_list])

Upvotes: 0

willeM_ Van Onsem

Reputation: 477794

Well that is what a Counter is all about:

import csv
from collections import Counter

result_list = []

with open('food.csv', 'rb') as food:
    next(food)
    for line in food:
        csv_food = csv.reader(food)
        result_list += list(csv_follows)

result_counter = Counter(x[1] for x in result_list)

print len(result_counter)

A Counter is a special dictionary. Internally the dictionary will contain {'apple': 2, 'banana': 2, 'berry': 1} so you can inspect all elements with their counts. len(result_counter) will give the number of distinct elements whereas sum(result_counter.values()) will give the total number of elements).

EDIT: apparently you want to count the number of distinct users. You can do this with:

len({x[0] for x in result_list})

The {.. for x in result_list} is set comprehension.

Upvotes: 0

Count Distinct Values in a List of Lists

Answers (5)

Related Questions