Shubham R
Shubham R

Reputation: 7644

Frequency of a list of tuples

I have a list:

a = [(['7', '8'], ['4', '7'],['3', '4'],['3', '8'],['4', '8'],...............['3','4'])]

I want to create 2 columns which give me the frequency of the tuples in the list. For example:

bigram      frequency
['7','8']     2
['4','7']     3
['3', '4']    6

and so on.

And also, consider entries like ['7','8'] and ['8','7'] the same (duplicates). The only one entry should be in the column and frequency should be added to that.

I was trying to use

from collections import counter

and do some loop over it, but I was getting error as:

unhashable type: list

Upvotes: 1

Views: 3052

Answers (4)

Abhijit
Abhijit

Reputation: 63757

I guess you can use itertools.groupby to group the sorted list of items. The key to the group can be a custom key which can create an ordered list. For a binary tuple, you can use a simple comparison to create such a tuple

Considering

a = [(['7', '8'], ['4', '7'],['3', '4'],['3', '8'],['4', '8'],['4','3'])]


from itertools import groupby
key = lambda tup: tup if tup[0] < tup[1] else tup[::-1]
[(key,  len(list(values))) 
 for key, values in groupby(sorted(a[0], key = key), key = key)]
Out[42]: 
[(['3', '4'], 2),
 (['3', '8'], 1),
 (['4', '7'], 1),
 (['4', '8'], 1),
 (['7', '8'], 1)]

If there are more than two items in a list, consider using sorted as a key. This may not be efficient but can be convenient

[(key,  len(list(values)))
 for key, values in groupby(sorted(a[0], key = sorted), key = sorted)]
Out[37]: 
[(['3', '4'], 2),
 (['3', '8'], 1),
 (['4', '7'], 1),
 (['4', '8'], 1),
 (['7', '8'], 1)]

Upvotes: 0

Kasravnd
Kasravnd

Reputation: 107337

List are not hashable to be use as the dictionary keys, you need to convert them to a hashable object. Which in this case tuple is a suitable choice:

In [5]: Counter(map(tuple, a[0])).items()
Out[5]: 
[(('4', '7'), 1),
 (('4', '8'), 1),
 (('7', '8'), 1),
 (('3', '4'), 2),
 (('3', '8'), 1)]

If you want to consider unordere arrays the same you have to sort them them pass them to Counter:

In [7]: a
Out[7]: 
[(['7', '8'],
  ['4', '7'],
  ['3', '4'],
  ['3', '8'],
  ['4', '8'],
  ['3', '4'],
  ['7', '4'])]

In [8]: Counter(tuple(sorted(i)) for i in a[0])
Out[8]: Counter({('4', '7'): 2, ('3', '4'): 2, ('3', '8'): 1, ('7', '8'): 1, ('4', '8'): 1})

Note that, since your numbers are string if they have more than one digit length you should convert them to integer before sorting, otherwise they'd be sorted lexicographically.

Upvotes: 3

Mr. A
Mr. A

Reputation: 1231

try this:

from collections import Counter

a = [(['7', '8'], ['4', '7'],['3', '4'],['3', '8'],['4', '8'],['3','4'],['7','8'],['8','7'],['4','3'])]

frequency_list = Counter(tuple(sorted(i)) for i in a[0])

print "bigram","frequency"
for key,val in frequency_list.items():
    print key, val

output is as follows

bigram    frequency
('4', '7') 1
('4', '8') 1
('7', '8') 3
('3', '4') 3
('3', '8') 1

Upvotes: 3

Jason
Jason

Reputation: 726

It will work using Counter if you change your list to: a = [('7', '8'), ... ('4', '7')]

Or you can map your lists to tuples, because tuples are hashable but not lists.

[Update] Sort and then map each of your list to tuples first. Counter(map(lambda x: tuple(sorted(x)), a[0])).items() (Based on @Kasramvd).

Upvotes: 2

Related Questions