Reputation: 7281
I have two dictionaries that look like so:
dict_of_items = tf_idf_by_doc {1: [('dog', 3), ('bird', 0)], 2: [('egret', 2), ('cat', 3), ('bird', 0), ('aardvark', 1)], 3: [('fish', 6), ('bird', 0), ('dog', 1), ('aardvark', 5)], 4: [('fish', 6), ('bird', 0), ('dog', 1), ('aardvark', 2)], 5: [('egret', 4), ('bird', 0)], 6: [('bird', 0)], 7: [('dog', 5), ('bird', 0)], 8: [('bird', 0), ('aardvark', 1)]}
dict_of_search = {1: [('bird', 0), ('dog', 1), ('cat', 3)]}
I need to compute the dot product between the dict_of_search
and each of the keys in the dict_of_items
, and then store the resulting dot product values and keep track by key. What I mean is...
In dict_of_items
, 1 and the item in dict_of_search
have a vectors of:
| | dict_of_items_1 | dict_of_search |
|:----:|:---------------:|:--------------:|
| bird | 0 | 0 |
| dog | 3 | 1 |
| cat | 0 | 3 |
And so my dot product would be: 3
Desired results would be a dictionary of keys in dict_of_items and their respective dot products as compared to dict_of_search (this will only ever be one item), sorted in descending order by dot product.
However, I am not sure how to translate the shape of my dictionaries into two arrays to perform a vector calculation, especially when to handle when one of the terms does not appear (for example, in the example above cat
did not appear in key 1
in dict_of_items_1
.
I have tried something like this using numpy
...
import numpy as numpy
def main():
test_arr_1 = [1,2,3]
test_arr_2 = [3,2,6]
first_dot_product = numpy.dot(test_arr_1, test_arr_2)
print("First Example: ", first_dot_product)
test_arr_3 = [3,0,1]
test_arr_4 = [2,10]
second_dot_product = numpy.dot(test_arr_3, test_arr_4)
print("Second Example Missing Value: ", second_dot_product)
main()
But that fails since the vectors are not of the same size and shape.
ValueError: shapes (3,) and (2,) not aligned: 3 (dim 0) != 2 (dim 0)
I have also tried to reshape the dictionary values into lists:
def main():
dict_of_items = {'1': [('bird', 0), ('dog', 3), ('egret', 2), ('bird', 0), ('aardvark', 1), ('cat', 3), ('dog', 1), ('bird', 0), ('fish', 6), ('aardvark', 5), ('dog', 1), ('bird', 0), ('fish', 6), ('aardvark', 2), ('egret', 4), ('bird', 0), ('bird', 0), ('bird', 0), ('dog', 5), ('bird', 0), ('aardvark', 1)]}
test_list_of_lists = []
for k, v in dict_of_items.items():
curr_list = []
for aTuple in v:
curr_list.append(aTuple[1])
test_list_of_lists.append(curr_list)
print(test_list_of_lists)
main()
But that simply merges everything into one list incorrectly: [[0, 3, 2, 0, 1, 3, 1, 0, 6, 5, 1, 0, 6, 2, 4, 0, 0, 0, 5, 0, 1]]
I also took a look at this post, but that dictionary is in a much more simple format.
Upvotes: 2
Views: 715
Reputation: 5461
it will be easier if you convert your tuples to dictionary like below. then we can use list comprehension like that
dict_of_items = {key:dict(value) for key, value in dict_of_items.items()}
dict_of_search = {key:dict(value) for key, value in dict_of_search.items()}
{item_key: sum([search[key]*item.get(key,0) for key in search.keys()])
for item_key, item in dict_of_items.items()
for search in dict_of_search.values()}
Upvotes: 0
Reputation: 61910
To compute the doc product of the values on dict_of_search
vs dict_of_items
, you could do:
def prod(source, target):
return sum(source.get(key, 0) * target.get(key, 0) for key in source.keys() | target.keys())
dict_of_items = {1: [('dog', 3), ('bird', 0)], 2: [('egret', 2), ('cat', 3), ('bird', 0), ('aardvark', 1)],
3: [('fish', 6), ('bird', 0), ('dog', 1), ('aardvark', 5)],
4: [('fish', 6), ('bird', 0), ('dog', 1), ('aardvark', 2)], 5: [('egret', 4), ('bird', 0)],
6: [('bird', 0)], 7: [('dog', 5), ('bird', 0)], 8: [('bird', 0), ('aardvark', 1)]}
dict_of_search = {1: [('bird', 0), ('dog', 1), ('cat', 3)]}
for k, v in dict_of_items.items():
for se in dict_of_search.values():
print(k, prod(dict(v), dict(se)))
Output
1 3
2 9
3 1
4 1
5 0
6 0
7 5
8 0
If you want to store the results in a dictionary, do:
result = {}
for k, v in dict_of_items.items():
for se in dict_of_search.values():
result[k] = prod(dict(v), dict(se))
print(result)
Output
{1: 3, 2: 9, 3: 1, 4: 1, 5: 0, 6: 0, 7: 5, 8: 0}
Upvotes: 1