Reputation: 7644
I have a list:
a = [(['7', '8'], ['4', '7'],['3', '4'],['3', '8'],['4', '8'],...............['3','4'])]
I want to create 2 columns which give me the frequency of the tuples in the list. For example:
bigram frequency
['7','8'] 2
['4','7'] 3
['3', '4'] 6
and so on.
And also, consider entries like ['7','8']
and ['8','7']
the same (duplicates). The only one entry should be in the column and frequency should be added to that.
I was trying to use
from collections import counter
and do some loop over it, but I was getting error as:
unhashable type: list
Upvotes: 1
Views: 3052
Reputation: 63757
I guess you can use itertools.groupby
to group the sorted list of items. The key to the group can be a custom key which can create an ordered list. For a binary tuple, you can use a simple comparison to create such a tuple
Considering
a = [(['7', '8'], ['4', '7'],['3', '4'],['3', '8'],['4', '8'],['4','3'])]
from itertools import groupby
key = lambda tup: tup if tup[0] < tup[1] else tup[::-1]
[(key, len(list(values)))
for key, values in groupby(sorted(a[0], key = key), key = key)]
Out[42]:
[(['3', '4'], 2),
(['3', '8'], 1),
(['4', '7'], 1),
(['4', '8'], 1),
(['7', '8'], 1)]
If there are more than two items in a list, consider using sorted as a key. This may not be efficient but can be convenient
[(key, len(list(values)))
for key, values in groupby(sorted(a[0], key = sorted), key = sorted)]
Out[37]:
[(['3', '4'], 2),
(['3', '8'], 1),
(['4', '7'], 1),
(['4', '8'], 1),
(['7', '8'], 1)]
Upvotes: 0
Reputation: 107337
List are not hashable to be use as the dictionary keys, you need to convert them to a hashable object. Which in this case tuple
is a suitable choice:
In [5]: Counter(map(tuple, a[0])).items()
Out[5]:
[(('4', '7'), 1),
(('4', '8'), 1),
(('7', '8'), 1),
(('3', '4'), 2),
(('3', '8'), 1)]
If you want to consider unordere arrays the same you have to sort them them pass them to Counter
:
In [7]: a
Out[7]:
[(['7', '8'],
['4', '7'],
['3', '4'],
['3', '8'],
['4', '8'],
['3', '4'],
['7', '4'])]
In [8]: Counter(tuple(sorted(i)) for i in a[0])
Out[8]: Counter({('4', '7'): 2, ('3', '4'): 2, ('3', '8'): 1, ('7', '8'): 1, ('4', '8'): 1})
Note that, since your numbers are string if they have more than one digit length you should convert them to integer before sorting, otherwise they'd be sorted lexicographically.
Upvotes: 3
Reputation: 1231
try this:
from collections import Counter
a = [(['7', '8'], ['4', '7'],['3', '4'],['3', '8'],['4', '8'],['3','4'],['7','8'],['8','7'],['4','3'])]
frequency_list = Counter(tuple(sorted(i)) for i in a[0])
print "bigram","frequency"
for key,val in frequency_list.items():
print key, val
output is as follows
bigram frequency
('4', '7') 1
('4', '8') 1
('7', '8') 3
('3', '4') 3
('3', '8') 1
Upvotes: 3
Reputation: 726
It will work using Counter if you change your list to:
a = [('7', '8'), ... ('4', '7')]
Or you can map your lists to tuples, because tuples are hashable but not lists.
[Update] Sort and then map each of your list to tuples first.
Counter(map(lambda x: tuple(sorted(x)), a[0])).items()
(Based on @Kasramvd).
Upvotes: 2