Reputation: 39
I have an a 2D array where each element is a pair of two tag, like ["NOUN", "VERB"] and I want to count the number of times each of these unique pairs occurs in a large dataset.
So far I have tried using defaultdict(int) and Counter() to easily just add the element if previously not found, or if found increase the value by 1.
dTransition = Counter()
# dTransition = defaultdict(int)
# <s> is a start of sentence tag
pairs = [[('<s>', 'NOUN')], [('CCONJ', 'NOUN')], [('NOUN', 'SCONJ')], [('SCONJ', 'NOUN')]]
for pair in pairs:
dTransition[pairs] += 1
This does not work as it does not accept two arguments. So im wondering if there is an easy way to check the dictionary if a key that is a 2D array already exist, and if so increase the value by 1.
Upvotes: 1
Views: 2872
Reputation: 88226
You need to flatten your list, given that unlike lists, tuples are hashable. A simple option is using itertools.chain
and then building a Counter
with the list of tuples:
from itertools import chain
Counter(chain(*pairs))
Output
Counter({('<s>', 'NOUN'): 1, ('CCONJ', 'NOUN'): 1,
('NOUN', 'SCONJ'): 1, ('SCONJ', 'NOUN'): 1})
Upvotes: 5
Reputation: 5449
Your solution with defaultdict
was correct, but you have to insert the two values as a tuple for the key of the dictionary. The tuple is always in your example the first element of the lists:
import collections
dTransition = collections.defaultdict(int)
# <s> is a start of sentence tag
pairs = [[('<s>', 'NOUN')], [('CCONJ', 'NOUN')], [('NOUN', 'SCONJ')], [('SCONJ', 'NOUN')],[('SCONJ', 'NOUN')]]
for pair in pairs:
dTransition[pair[0]] += 1
Then it works
Upvotes: 0
Reputation: 98
You can use a numpy array to do this with an already built in function.
import numpy as np
#convert array to numpy array
pairs= np.array(pairs)
#pairs.unique() returns an array with only the unique elements
#len() returns the length(count) of unique pairs
count= len(pairs.unique())
Upvotes: 1