Reputation: 29
Let's say I have a list like this:
[(9600002, 42, 3),
(9600001, 17, 3),
(9600003, 11, 1),
(9600002, 14, 5),
(9600001, 17, 1),
(9600003, 11, 4),
(9600001, 17, 4),
(9600001, 14, 3),
(9600002, 42, 6),
(9600002, 42, 1)]
The first number is the user_id, the second is the tv_program_code, and the third is the season_id.
How can I find out the program_code with more than 1 season subscribed, and then print the user_id and the tv_program_code? For example:
9600001 17
Or do you have any suggestion of which data structure I should apply?
Upvotes: 0
Views: 304
Reputation: 164673
One method is to use collections.Counter
.
The idea is to count the number of series per (user, program) combination using a dictionary.
Then filter for count greater than 1 via a dictionary comprehension.
from collections import Counter
lst = [(9600002, 42, 3), (9600001, 17, 3), (9600003, 11, 1),
(9600002, 14, 5), (9600001, 17, 1), (9600003, 11, 4),
(9600001, 17, 4), (9600001, 14, 3), (9600002, 42, 6),
(9600002, 42, 1)]
c = Counter()
for user, program, season in lst:
c[(user, program)] += 1
print(c)
# Counter({(9600002, 42): 3, (9600001, 17): 3, (9600003, 11): 2,
# (9600002, 14): 1, (9600001, 14): 1})
res = {k: v for k, v in c.items() if v > 1}
print(res)
# {(9600002, 42): 3, (9600001, 17): 3, (9600003, 11): 2}
print(res.keys())
# dict_keys([(9600002, 42), (9600001, 17), (9600003, 11)])
Note on Counter versus defaultdict(int)
Counter
is twice as slow as defaultdict(int)
, see benchmarking below. You can switch easily to defaultdict(int)
if performance matters and none of these features are relevant to you:
Counter
keys don't get added automatically when querying.Counter
objects.Counter
offers additional methods, e.g. elements
, most_common
.Benchmarking on Python 3.6.2.
from collections import defaultdict, Counter
lst = lst * 100000
def counter(lst):
c = Counter()
for user, program, season in lst:
c[(user, program)] += 1
return c
def dd(lst):
d = defaultdict(int)
for user, program, season in lst:
d[(user, program)] += 1
return d
%timeit counter(lst) # 900 ms
%timeit dd(lst) # 450 ms
Upvotes: 2
Reputation: 12669
There are many ways to do this task
first using detaultdict :
import collections
data=[(9600002, 42, 3),
(9600001, 17, 3),
(9600003, 11, 1),
(9600002, 14, 5),
(9600001, 17, 1),
(9600003, 11, 4),
(9600001, 17, 4),
(9600001, 14, 3),
(9600002, 42, 6),
(9600002, 42, 1)]
d=collections.defaultdict(list)
for i in data:
d[(i[0],i[1])].append(i)
print(list(filter(lambda x:len(x)>1,d.values())))
output:
[[(9600003, 11, 1), (9600003, 11, 4)], [(9600001, 17, 3), (9600001, 17, 1), (9600001, 17, 4)], [(9600002, 42, 3), (9600002, 42, 6), (9600002, 42, 1)]]
Second using itertools groupby :
import itertools
print(list(filter(lambda x:len(x)>1,[list(j) for i,j in itertools.groupby(sorted(data),key=lambda x:(x[0],x[1]))])))
output:
[[(9600001, 17, 1), (9600001, 17, 3), (9600001, 17, 4)], [(9600002, 42, 1), (9600002, 42, 3), (9600002, 42, 6)], [(9600003, 11, 1), (9600003, 11, 4)]]
Third approach
At last you can also try manual approach instead of using any import :
d={}
for i in data:
if (i[0],i[1]) not in d:
d[(i[0],i[1])]=[i]
else:
d[(i[0],i[1])].append(i)
print(list(filter(lambda x:len(x)>1,d.values())))
output:
[[(9600003, 11, 1), (9600003, 11, 4)], [(9600001, 17, 3), (9600001, 17, 1), (9600001, 17, 4)], [(9600002, 42, 3), (9600002, 42, 6), (9600002, 42, 1)]]
Upvotes: 1