Reputation: 323
I just imported the values from a .csv file to a list of lists, and now I need to know how many distinct users are there. The file itself looks like to following:
[['123', 'apple'], ['123', 'banana'], ['345', 'apple'], ['567', 'berry'], ['567', 'banana']]
Basically, I need to know how many distinct users (first value in each sub-list is a user ID) are there (3
in this case, over 6,000 after doing some Excel filtering), and what are the frequencies for the food itself: {'apple': 2, 'banana': 2, 'berry': 1}
.
Here is the code I have tried to use for distinct values counts (using Python 2.7):
import csv
with open('food.csv', 'rb') as food:
next(food)
for line in food:
csv_food = csv.reader(food)
result_list = list(csv_follows)
result_distinct = list(x for l in result_list for x in l)
print len(result_distinct)
Upvotes: 6
Views: 7460
Reputation: 146
A=[[0, 1],[0, 3],[1, 3],[3, 4],[3, 6],[4, 5],[4, 7],[5, 7],[6, 4]]
K = []
for _ in range(len(A)):
K.extend(A[_])
print(set(K))
OUTPUT:
{0, 1, 3, 4, 5, 6, 7}
In python extend
function extends the list instead of appending it that's what we need and then use set
to print distinct values.
Upvotes: 0
Reputation: 3338
You can use [x[0] for x in result_list]
to get a list of all the ids. Then you create a set
, that is all list of all unique items in that list. The length of the set will then give you the number of unique users.
len(set([x[0] for x in result_list]))
Upvotes: 3
Reputation: 18045
For the first question, use set
,
import operator
lists = [['123', 'apple'], ['123', 'banana'], ['345', 'apple'], ['567', 'berry'], ['567', 'banana']]
nrof_users = len(set(map(operator.itemgetter(0), lists)))
print(nrof_users)
# 3
For the second question, use collections.Counter
,
import collections
import operator
result = collections.Counter(map(operator.itemgetter(1), lists))
print(result)
# Counter({'apple': 2, 'banana': 2, 'berry': 1})
Upvotes: 0
Reputation: 4097
To get the distinct users, you can use a set:
result_distinct = len({x[0] for x in result_list})
And the frequencies, you can use collections.Counter
:
freqs = collections.Counter([x[1] for x in result_list])
Upvotes: 0
Reputation: 477794
Well that is what a Counter
is all about:
import csv
from collections import Counter
result_list = []
with open('food.csv', 'rb') as food:
next(food)
for line in food:
csv_food = csv.reader(food)
result_list += list(csv_follows)
result_counter = Counter(x[1] for x in result_list)
print len(result_counter)
A Counter
is a special dictionary. Internally the dictionary will contain {'apple': 2, 'banana': 2, 'berry': 1}
so you can inspect all elements with their counts. len(result_counter)
will give the number of distinct elements whereas sum(result_counter.values())
will give the total number of elements).
EDIT: apparently you want to count the number of distinct users. You can do this with:
len({x[0] for x in result_list})
The {.. for x in result_list}
is set comprehension.
Upvotes: 0