Reputation: 1349
I have a list of lists like the following:
listoflist = [["A", "B", "A", "C", "D"], ["Z", "A", "B", "C"], ["D", "D", "X", "Y", "Z"]]
I want to find the number of sublists that each unique value in listoflist
occurs in. For example, "A" shows up in two sublists, while "D" shows up in two sublists also, even though it occurs twice in listoflist[3]
.
How can I get a dataframe which has each unique element in one column and the frequency (number of sublists each unique element shows up in)?
Upvotes: 1
Views: 2299
Reputation: 153540
Another way to do this is to use pandas:
import pandas as pd
df = pd.DataFrame(listoflist)
df.stack().reset_index().groupby(0)['level_0'].nunique().to_dict()
Output:
{'A': 2, 'B': 2, 'C': 2, 'D': 2, 'X': 1, 'Y': 1, 'Z': 2}
Upvotes: 1
Reputation: 44992
Essentially, it seems that you want something like
Counter(x for xs in listoflist for x in set(xs))
Each list is converted into a set first, to exclude duplicates. Then the sequence of sets is flatmapped and fed into the Counter
.
Full code:
from collections import Counter
listoflist = [["A", "B", "A", "C", "D"], ["Z", "A", "B", "C"], ["D", "D", "X", "Y", "Z"]]
c = Counter(x for xs in listoflist for x in set(xs))
print(c)
Results in:
# output:
# Counter({'B': 2, 'C': 2, 'Z': 2, 'D': 2, 'A': 2, 'Y': 1, 'X': 1})
Upvotes: 2
Reputation: 16434
You can use: itertools.chain
together with collections.Counter
:
In [94]: import itertools as it
In [95]: from collections import Counter
In [96]: Counter(it.chain(*map(set, listoflist)))
Out[96]: Counter({'A': 2, 'B': 2, 'C': 2, 'D': 2, 'X': 1, 'Y': 1, 'Z': 2})
As mentioned in the comment by @Jean-François Fabre, you can also use:
In [97]: Counter(it.chain.from_iterable(map(set, listoflist)))
Out[97]: Counter({'A': 2, 'B': 2, 'C': 2, 'D': 2, 'X': 1, 'Y': 1, 'Z': 2})
Upvotes: 3