qwerty ayyy
qwerty ayyy

Reputation: 385

Python 3 - Iterate through corpus and record its count

I have a corpus which is a list of tuple, with the tuple containing a word and a POS tag. My question right now is given every word and every POS tag that exists in the corpus, iterate through the corpus and record the amount of time each word and tag combo exist in the corpus. If the word and tag combo does not exist in the corpus make the count 0.

     possible_tags = ['Verb','Noun','Det']

     possible_words = ['Merger', 'proposed', 'Wards', 'protected', 'A']

     corpus = [('Merger', 'Noun'), ('proposed', 'Verb'), ('Wards', 'Noun'), ('protected', 'Verb'), ('A', 'Det'), ('Merger','Noun')]

     output = {'Merger_Noun':2, 'Merger_Verb':0, 'Merger_Det':0, 'proposed_Noun':0, 'proposed_Verb':1, 'proposed_Det':0, ....... }

Upvotes: 1

Views: 321

Answers (1)

sai
sai

Reputation: 462

Try converting everything to a dictionary to make it easier.

possible_tags = ['Verb','Noun','Det']

possible_words = ['Merger', 'proposed', 'Wards', 'protected', 'A']

corpus = [('Merger', 'Noun'), ('proposed', 'Verb'), ('Wards', 'Noun'), ('protected', 'Verb'), ('A', 'Det'), ('Merger','Noun')]

#Initialize output to empty dictionary
output = {}

//dictionary initialization.

for each_word in possible_words:
    for each_tag in possible_tags:
        key = each_word + "_" + each_tag
        output[key] = 0


#iterate through corpus
for each in corpus:
    #extract each tuple, and update dictionary with keys as string and count as integer
    output[each[0] +"_"+each[1]] += 1

Upvotes: 1

Related Questions