user2997872
user2997872

Reputation: 39

Count unique tuples in nested list

I have an a 2D array where each element is a pair of two tag, like ["NOUN", "VERB"] and I want to count the number of times each of these unique pairs occurs in a large dataset.

So far I have tried using defaultdict(int) and Counter() to easily just add the element if previously not found, or if found increase the value by 1.

dTransition = Counter()
# dTransition = defaultdict(int)

# <s> is a start of sentence tag
pairs = [[('<s>', 'NOUN')], [('CCONJ', 'NOUN')], [('NOUN', 'SCONJ')], [('SCONJ', 'NOUN')]]

for pair in pairs:
      dTransition[pairs] += 1

This does not work as it does not accept two arguments. So im wondering if there is an easy way to check the dictionary if a key that is a 2D array already exist, and if so increase the value by 1.

Upvotes: 1

Views: 2872

Answers (3)

yatu
yatu

Reputation: 88226

You need to flatten your list, given that unlike lists, tuples are hashable. A simple option is using itertools.chain and then building a Counter with the list of tuples:

from itertools import chain
Counter(chain(*pairs))

Output

Counter({('<s>', 'NOUN'): 1, ('CCONJ', 'NOUN'): 1, 
         ('NOUN', 'SCONJ'): 1, ('SCONJ', 'NOUN'): 1})

Upvotes: 5

Code Pope
Code Pope

Reputation: 5449

Your solution with defaultdict was correct, but you have to insert the two values as a tuple for the key of the dictionary. The tuple is always in your example the first element of the lists:

import collections 
dTransition = collections.defaultdict(int)

# <s> is a start of sentence tag
pairs = [[('<s>', 'NOUN')], [('CCONJ', 'NOUN')], [('NOUN', 'SCONJ')], [('SCONJ', 'NOUN')],[('SCONJ', 'NOUN')]]

for pair in pairs:
      dTransition[pair[0]] += 1

Then it works

Upvotes: 0

D.Sanders
D.Sanders

Reputation: 98

You can use a numpy array to do this with an already built in function.

import numpy as np

#convert array to numpy array
pairs= np.array(pairs)

#pairs.unique() returns an array with only the unique elements
#len() returns the length(count) of unique pairs
count= len(pairs.unique())

Upvotes: 1

Related Questions