user13590204
user13590204

Reputation: 21

How to deal with python dictionary assignment?

I am currently struggling to solve this issue. Need help. What I am trying to achieve is the answer: below:(expected answer)

{'and': 1.91629073, 'document': 1.22314355, 'first': 1.51082562, 'is': 1., 'one': 1.91629073, 'second': 1.91629073, 'the': 1., 'third': 1.91629073, 'this,: 1.}        ]

But what I am actually getting is this:

{'and': 2.791759469228055, 'document': 3.0794415416798357, 'first': 2.9459101490553135, 'is': 3.1972245773362196, 'one': 2.791759469228055, 'second': 2.791759469228055, 'the': 3.1972245773362196, 'third': 2.791759469228055, 'this': 3.1972245773362196}

Main point of consideration is this code: The code and formula is right, but for some reason its taking in few extra numerical values in in len(corpus)

vocab1[word2] = 1+(math.log(1+len(corpus)/1+count))

The actual code starts from here:

corpus = [
 'this is the first document',
 'this document is the second document',
 'and this is the third one',
 'is this the first document',]

import math
unique_words = set() # at first we will initialize an empty set
lenofcorpus= len(corpus)
# print(lenofcorpus)
vocab1 = dict()
# vocab = dict()
# check if its list type or not
if isinstance(corpus, (list,)):
    for row in corpus: # for each review in the dataset
        for word in row.split(" "): # for each word in the review. #split method converts a string into list of words
            if len(word) < 2:
                continue
            unique_words.add(word)
    unique_words = sorted(list(unique_words))
    # print(unique_words)
    for idx, word2 in enumerate(unique_words) :
      count = 0
      for sentence in corpus :
        if word2 in sentence :
          count+=1
      # print(word2, count)
          vocab1[word2] = count
          # print(lenofcorpus)
          vocab1[word2] = 1+(math.log(1+len(corpus)/1+count))#its taking log of 12/2 instead it should take 5/2, its taking 7 extra or six
    print(vocab1)

I want to know how to achieve the desired answer. Secondly, what was the thought process to arrive at that answer, and what I am doing wrong. It would really help if anyone gives an explanation. I know I am also doing something wrong with dictionary looping function and also assignment. FYI: len(corpus) = 4 # thats the length of entire corpus, it has 4 sentences.

Upvotes: 0

Views: 86

Answers (1)

Dave
Dave

Reputation: 8091

You are missing parentheses. The results you describe you want correspond to this:

vocab1[word2] = 1+(math.log((1+len(corpus))/(1+count)))

or spelled out:

numerator = (1+len(corpus))
denominator = (1+count)
result = 1+math.log(numerator/denominator)

What you had orginally written is equivalent to

vocab1[word2] = 1+math.log(1+(len(corpus)/1)+count)

it's a pretty common mistake to write x/y+z when you meant x/(y+z), or x+y/z when you meant (x+y)/z, you've manage to do both at the same time.

Upvotes: 1

Related Questions