count occurances in list of sets

Question

I have a variable containing of list of lists contianing 2 sets, it looks like this:

[[{'angular', 'java', 'sql', 'xml-schema'},
  {'db2', 'docker', 'git', 'hibernate', 'jenkins', 'maven', 'rest'}],
 [{'java'}, {'maven'}],
 [{'java'}, {'oracle'}],
 [{'c++', 'cobol', 'java', 'javascript'}, set()],
 [{'angular', 'java'}, set()],
 [{'java'}, set()]]

Now what I would like to do is count the occurances of every single item alltogether, I'm just not sure how to go about this. Should I flatten the whole list or is there some function regarding sets that can do this?

Thanks!

azro · Accepted Answer

You may use a collections.Counter and provide him a flatten version of your data

from collections import Counter

values: list[list[set[str]]] = [
    [{'angular', 'java', 'sql', 'xml-schema'}, {'db2', 'docker', 'git', 'hibernate', 'jenkins', 'maven', 'rest'}],
    [{'java'}, {'maven'}],
    [{'java'}, {'oracle'}],
    [{'c++', 'cobol', 'java', 'javascript'}, set()],
    [{'angular', 'java'}, set()],
    [{'java'}, set()]
]

language = 'java'

ocurrences = Counter([word for sublist in values for subset in sublist for word in subset])
print(ocurrences.most_common(3))  # [('java', 6), ('angular', 2), ('maven', 2)]
print(ocurrences[language])  # 6

If you want to separate the 2 sets, in language / other, do that way

ocurrences_languages = Counter([word for sublist in values for word in sublist[0]])
print(ocurrences_languages.most_common(3))  # [('java', 6), ('angular', 2), ('sql', 1)]

ocurrences_other = Counter([word for sublist in values for word in sublist[1]])
print(ocurrences_other.most_common(3))  # [('maven', 2), ('docker', 1), ('rest', 1)]

count occurances in list of sets

Answers (2)

Related Questions