Shane FAN
Shane FAN

Reputation: 29

How to count an element in a list inside a list in Python

Let's say I have a list like this:

[(9600002, 42, 3),
(9600001, 17, 3),
(9600003, 11, 1),
(9600002, 14, 5),
(9600001, 17, 1),
(9600003, 11, 4),
(9600001, 17, 4),
(9600001, 14, 3),
(9600002, 42, 6),
(9600002, 42, 1)] 

The first number is the user_id, the second is the tv_program_code, and the third is the season_id.

My question

How can I find out the program_code with more than 1 season subscribed, and then print the user_id and the tv_program_code? For example:

9600001 17

Or do you have any suggestion of which data structure I should apply?

Upvotes: 0

Views: 304

Answers (2)

jpp
jpp

Reputation: 164673

One method is to use collections.Counter.

The idea is to count the number of series per (user, program) combination using a dictionary.

Then filter for count greater than 1 via a dictionary comprehension.

from collections import Counter

lst = [(9600002, 42, 3), (9600001, 17, 3), (9600003, 11, 1),
       (9600002, 14, 5), (9600001, 17, 1), (9600003, 11, 4),
       (9600001, 17, 4), (9600001, 14, 3), (9600002, 42, 6),
       (9600002, 42, 1)] 

c = Counter()

for user, program, season in lst:
    c[(user, program)] += 1

print(c)

# Counter({(9600002, 42): 3, (9600001, 17): 3, (9600003, 11): 2,
#          (9600002, 14): 1, (9600001, 14): 1})

res = {k: v for k, v in c.items() if v > 1}

print(res)

# {(9600002, 42): 3, (9600001, 17): 3, (9600003, 11): 2}

print(res.keys())

# dict_keys([(9600002, 42), (9600001, 17), (9600003, 11)])

Note on Counter versus defaultdict(int)

Counter is twice as slow as defaultdict(int), see benchmarking below. You can switch easily to defaultdict(int) if performance matters and none of these features are relevant to you:

  1. Missing Counter keys don't get added automatically when querying.
  2. You can add / subtract Counter objects.
  3. Counter offers additional methods, e.g. elements, most_common.

Benchmarking on Python 3.6.2.

from collections import defaultdict, Counter

lst = lst * 100000

def counter(lst):
    c = Counter()
    for user, program, season in lst:
        c[(user, program)] += 1
    return c

def dd(lst):
    d = defaultdict(int)
    for user, program, season in lst:
        d[(user, program)] += 1
    return d

%timeit counter(lst)  # 900 ms
%timeit dd(lst)       # 450 ms

Upvotes: 2

Aaditya Ura
Aaditya Ura

Reputation: 12669

There are many ways to do this task

first using detaultdict :

import collections
data=[(9600002, 42, 3),
(9600001, 17, 3),
(9600003, 11, 1),
(9600002, 14, 5),
(9600001, 17, 1),
(9600003, 11, 4),
(9600001, 17, 4),
(9600001, 14, 3),
(9600002, 42, 6),
(9600002, 42, 1)]

d=collections.defaultdict(list)

for i in data:
    d[(i[0],i[1])].append(i)

print(list(filter(lambda x:len(x)>1,d.values())))

output:

[[(9600003, 11, 1), (9600003, 11, 4)], [(9600001, 17, 3), (9600001, 17, 1), (9600001, 17, 4)], [(9600002, 42, 3), (9600002, 42, 6), (9600002, 42, 1)]]

Second using itertools groupby :

import itertools

print(list(filter(lambda x:len(x)>1,[list(j) for i,j in itertools.groupby(sorted(data),key=lambda x:(x[0],x[1]))])))

output:

[[(9600001, 17, 1), (9600001, 17, 3), (9600001, 17, 4)], [(9600002, 42, 1), (9600002, 42, 3), (9600002, 42, 6)], [(9600003, 11, 1), (9600003, 11, 4)]]

Third approach

At last you can also try manual approach instead of using any import :

d={}

for i in data:
    if (i[0],i[1]) not in d:
        d[(i[0],i[1])]=[i]
    else:
        d[(i[0],i[1])].append(i)

print(list(filter(lambda x:len(x)>1,d.values())))

output:

[[(9600003, 11, 1), (9600003, 11, 4)], [(9600001, 17, 3), (9600001, 17, 1), (9600001, 17, 4)], [(9600002, 42, 3), (9600002, 42, 6), (9600002, 42, 1)]]

Upvotes: 1

Related Questions