Reputation: 1445
I have the following sample data
docs_word = ["this is a test", "this is another test"]
docs_txt = ["this is a great test", "this is another test"]
What I want to do now is to create two dictionaries of the words in the sample files, compare them and store the words that are in the docs_txt file but not in the docs_word file in a seperate dictionary. Therefore I wrote the following:
count_txtDoc = Counter()
for file in docs_word:
words = file.split(" ")
count_txtDoc.update(words)
count_wrdDoc = Counter()
for file in docs_txt:
words = file.split(" ")
count_wrdDoc.update(words)
#Create a list of the dictionary keys
words_worddoc = count_wrdDoc.keys()
words_txtdoc = count_txtDoc.keys()
#Look for values that are in word_doc but not in txt_doc
count_all = Counter()
for val in words_worddoc:
if val not in words_txtdoc:
count_all.update(val)
print(val)
The thing now is that the correct values are printed. It shows: "great".
However if I print:
print(count_all)
I get the following output:
Counter({'a': 1, 'r': 1, 'e': 1, 't': 1, 'g': 1})
While I expected
Counter({'great': 1})
Any thoughts on how I can achieve this? # print(count_all)
Upvotes: 0
Views: 51
Reputation: 78554
Update the counter using an iterable containing the word, not the word itself (since the word is also iterable):
count_all.update([val])
# ^ ^
However, you may not need to create a new counter if you only the item. You can take the symmetric difference of the keys:
words_worddoc = count_wrdDoc.viewkeys() # use .keys() in Py3
words_txtdoc = count_txtDoc.viewkeys() # use .keys() in Py3
print(words_txtdoc ^ words_worddoc)
# set(['great'])
If you want the count also, you can compute the symmetric difference between both counters like so:
count_all = (count_wrdDoc - count_txtDoc) | (count_txtDoc - count_wrdDoc)
print (count_all)
# Counter({'great': 1})
Upvotes: 1