DannyBoy
DannyBoy

Reputation: 77

Key Error in Python Script

Trying to run the following code I get a Key Error ln 12:

import math
from collections import Counter

def retrieve():

    wordFrequency = {'bit':{1:3,2:4,3:19,4:0},'red':{1:0,2:0,3:15,4:0},'dog':{1:3,2:0,3:4,4:5}}
    search = {'bit':1,'dog':3,'shoe':5}

    sizeFileVec = {}
    for word, innerDict in wordFrequency.iteritems():
        for fileNum, appearances in innerDict.iteritems():
            sizeFileVec[fileNum] += appearances ** 2
            for fileNum in sizeFileVec:
                sizeFileVec[fileNum] = math.sqrt(sizeFileVec[fileNum])

    results = []
    for word, occurrences in search.iteritems():
        file_relevancy = Counter()
        for fileNum, appear_in_file in wordFrequency.get(word, {}).iteritems():
            file_relevancy[fileNum] += (occurrences * appear_in_file) / sizeFileVec[fileNum]

        results = [fileNum for (fileNum, count) in file_relevancy.most_common()]

    return results

print retrieve()

The code I am having an error with is supposed to take the inner dictionary of wordFrequency and then sum the squares of the values of each file number then square root this (there are 4 files) i.e. for file 1 it is sqrt(3^2 + 0^2 + 3^2).

results []

is supposed to return a list of the 4 files in order of most relevant based on the query. So in this example:

          bit     dog      shoe

File 1     3       3         0

File 2     4       0         0

File 3    19       4         0

File 4     0       5         0


Search     1       3         5

sim(1,S) = (3 * 1) + (3 * 3) + (0 * 5) / sqrt(3^2 + 3^2 + 0^2) * sqrt(1^2 + 3^2 + 5^2) = 0.478

The scalar product of each term is taken, then this is divided by the product of the magnitudes of the file and search.

This is done between the other 3 files and the search and stored in a list.

The list is then returned in order most relevant to least.

sim(2,S) = (4 * 1) + (0 * 3) + (0 * 5) / sqrt(4^2 + 0^2 + 0^2) * sqrt(1^2 + 3^2 + 5^2) = 0.169

sim(3,S) = (19 * 1) + (4 * 3) + (0 * 5) / sqrt(19^2 + 4^2 +0^2) * sqrt(1^2 + 3^2 + 5^2) = 0.26987

sim(4,S) = (0 * 1) + (5 * 3) + (0 * 5) / sqrt(0^2 + 5^2 + 0^2) * sqrt(1^2 + 3^2 + 5^2) = 0.507

Therefore [4,1,3,2] should be returned

Upvotes: 1

Views: 501

Answers (1)

pad
pad

Reputation: 2396

   sizeFileVec = {}
   for word, innerDict in wordFrequency.iteritems():
       for fileNum, appearances in innerDict.iteritems():
           sizeFileVec[fileNum] += appearances ** 2

This is wrong because the key doesn't yet exist, so python wouldn't know what to increment toappearance**2

You could do something like,

   sizeFileVec = {}
   for word, innerDict in wordFrequency.iteritems():
       for fileNum, appearances in innerDict.iteritems():
           if not sizeFileVec.has_key(filenum):
               sizeFileVec[filenum] = 0 #your default value
           sizeFileVec[fileNum] += appearances ** 2

(or use setdefault builtin method for the same effect). You need to make the same changes in line 18 where you repeat the above mistake.

Upvotes: 1

Related Questions