Reputation: 13
I am sorry if this has been asked elsewhere, but have been trying and googling all day to solve this to no avail.
I wish to initialise an empty dict as such:
empty_dict = {}
Then, I take rows from a csv file, saved into a variable, lets say saved_word_list
. In this saved_word_list
are the rows from the csv file which contain sentences. Each of these sentences on the rows are identified as either A
or B
. What I would like to do is to populate the empty_dict
with each unique word in the sentence so that each word is only counted once per line and added to the correct nested portion of the dict.
An example:
row_1 = {"this is a fine day to do a lot of coding"}
This row would be labelled A
, so our list would change to:
empty_dict = {'this':{'A':1,'B':0}, 'is':{'A':1, 'B':0}, 'a':{'A':1, 'B':0}.....}
So, below is as much as I have, but I would like to get to understanding how to reach the goal of the line above. Any ideas how I can get to this point?
for (sentence, label) in zip(saved_word_list, labels):
keys = set(labels)
values = set(sentence.split())
for key in keys:
for value in values:
if value not in empty_Dict:
empty_Dict[value][key] = value
else:
empty_Dict[value][key] += 1
Upvotes: 0
Views: 326
Reputation: 104
This should do exactly what you want.
def template_inner_dict():
"""
Creates an template dictionary with the row labels as keys
and 0 as values
"""
return {i: 0 for i in labels}
empty_dict = {}
for (sentence, label) in zip(saved_word_list, labels):
tokens = set(sentence.split())
for token in tokens:
# insert a template dict for the value not present in the dict
if token not in empty_dict:
empty_dict[token] = template_inner_dict()
# increment the label count associated with token by 1
empty_dict[token][label] += 1
Upvotes: 1