Reputation: 43
I am trying to show the n most common items of a list but getting the error: TypeError: unhashable type: 'list'
import collections
test = [[u'the\xa0official', u'MySQL'], [u'MySQL', u'repos'], [u'repos', u'for'], [u'for', u'Linux'], [u'Linux', u'a'], [u'a', u'little'], [u'little', u'over'], [u'over', u'a'], [u'a', u'year'], [u'year', u'ago,'], [u'ago,', u'the'], [u'the', u'offering'], [u'offering', u'has'], [u'has', u'grown'], [u'grown', u'steadily.\xa0Starting'], [u'steadily.\xa0Starting', u'off'], [u'off', u'with'], [u'with', u'support'], [u'support', u'for'], [u'for', u'the'], [u'the', u'Yum'], [u'Yum', u'based'], [u'based', u'family'], [u'family', u'of\xa0Red'], [u'of\xa0Red', u'Hat/Fedora/Oracle'], [u'Hat/Fedora/Oracle', u'Linux,'], [u'Linux,', u'we'], [u'we', u'added'], [u'added', u'Apt'], [u'Apt', u'repos'], [u'repos', u'for'], [u'for', u'Debian'], [u'Debian', u'and'], [u'and', u'Ubuntu'], [u'Ubuntu', u'in'], [u'in', u'late'], [u'late', u'spring,'], [u'spring,', u'and'], [u'and', u'throughout'], [u'throughout', u'all']]
print test[0]
print type(test)
print collections.Counter(test).most_common(3)
Upvotes: 1
Views: 308
Reputation: 16556
As the error say, list
are not hashable. One other way to circumvent the problem could be to go via strings: join the list with a separator (space seems a good choice), then do the count and split again:
>>> [(i.split(' '),j) for i,j in collections.Counter(' '.join(i) for i in test).most_common(3)]
[([u'repos', u'for'], 2), ([u'grown', u'steadily.\xa0Starting'], 1), ([u'Linux', u'a'], 1)]
Upvotes: 0
Reputation: 8709
>>> print collections.Counter(map(tuple,test)).most_common(3)
[((u'repos', u'for'), 2), ((u'and', u'throughout'), 1), ((u'based', u'family'), 1)]
Upvotes: 2
Reputation: 53678
collections.Counter
is based on a dictionary. As such your keys need to be hashable, and lists aren't hashable.
If you want to count individual strings then you can extract the elements from each list using a generator expression, as below:
c = collections.Counter(word for pair in test for word in pair)
If you want to count the pairs, for example as 2-grams, then you need to convert each inner list into a tuple (which is hashable) and then pass that, which again can be done using a generator expression
c2 = collections.Counter(tuple(pair) for pair in test)
Upvotes: 1
Reputation: 117856
You need to change the inner lists to tuple
so they are hashable
>>> from collections import Counter
>>> c = Counter(tuple(i) for i in test)
>>> c.most_common(3)
[(('repos', 'for'), 2),
(('Hat/Fedora/Oracle', 'Linux,'), 1),
(('year', 'ago,'), 1)]
Upvotes: 0