Henk Straten
Henk Straten

Reputation: 1445

Create a dictionary of words that are in one document but not in the other

I have the following sample data

docs_word = ["this is a test", "this is another test"]
docs_txt = ["this is a great test", "this is another test"]

What I want to do now is to create two dictionaries of the words in the sample files, compare them and store the words that are in the docs_txt file but not in the docs_word file in a seperate dictionary. Therefore I wrote the following:

count_txtDoc = Counter()
for file in docs_word:
  words = file.split(" ")
  count_txtDoc.update(words)

count_wrdDoc = Counter()
for file in docs_txt:
  words = file.split(" ")
  count_wrdDoc.update(words)

#Create a list of the dictionary keys
words_worddoc = count_wrdDoc.keys()
words_txtdoc = count_txtDoc.keys()

#Look for values that are in word_doc but not in txt_doc

count_all = Counter()
for val in words_worddoc:
  if val not in words_txtdoc:
   count_all.update(val)
   print(val)

The thing now is that the correct values are printed. It shows: "great".

However if I print:

print(count_all)

I get the following output:

Counter({'a': 1, 'r': 1, 'e': 1, 't': 1, 'g': 1})

While I expected

Counter({'great': 1})

Any thoughts on how I can achieve this? # print(count_all)

Upvotes: 0

Views: 51

Answers (1)

Moses Koledoye
Moses Koledoye

Reputation: 78554

Update the counter using an iterable containing the word, not the word itself (since the word is also iterable):

count_all.update([val])
#                ^   ^ 

However, you may not need to create a new counter if you only the item. You can take the symmetric difference of the keys:

words_worddoc = count_wrdDoc.viewkeys() # use .keys() in Py3
words_txtdoc = count_txtDoc.viewkeys()  # use .keys() in Py3

print(words_txtdoc ^ words_worddoc)
# set(['great'])

If you want the count also, you can compute the symmetric difference between both counters like so:

count_all = (count_wrdDoc - count_txtDoc) | (count_txtDoc - count_wrdDoc)

print (count_all)
# Counter({'great': 1})

Upvotes: 1

Related Questions