Reputation: 482
I have a dictionary of foods:
foods={
"chicken masala" : "curry",
"chicken burger" : "burger",
"beef burger" : "burger",
"chicken soup" : "appetizer",
"vegetable" : "curry"
}
Now I have a list of strings:
queries = ["best burger", "something else"]
I have to find out if there is any string in queries
that has and entry in our food
dictionary.
Like in the above example it should return True for best burger
.
Currently, I am calculating cosine similarity between each string in the list for all the entries in the foods.keys()
.
It works but it's very time inefficient. The food
dictionary has almost 1000 entries. Is there any efficient way to do so?
Edit:
Here the best burger should be returned because there is burger
in it and burger
is also present in chicken burger
in foods.keys()
. I am basically trying to find out if there is any query which is a food type.
This is how I am calculating :
import re, math
from collections import Counter
WORD = re.compile(r'\w+')
def get_cosine(text1, text2):
vec1 = text_to_vector(text1.lower())
vec2 = text_to_vector(text2.lower())
intersection = set(vec1.keys()) & set(vec2.keys())
numerator = sum([vec1[x] * vec2[x] for x in intersection])
sum1 = sum([vec1[x]**2 for x in vec1.keys()])
sum2 = sum([vec2[x]**2 for x in vec2.keys()])
denominator = math.sqrt(sum1) * math.sqrt(sum2)
if not denominator:
return 0.0
else:
return (float(numerator) / denominator) * 100
foods={
"chicken masala" : "curry",
"chicken burger" : "burger",
"beef burger" : "burger",
"chicken soup" : "appetizer",
"vegetable" : "curry"
}
queries = ["best burger", "something else"]
flag = False
food = []
for phrase in queries:
for k in foods.keys():
cosine = get_cosine(phrase, k)
if int(cosine) > 40:
flag = True
food.append(phrase)
break
print('Foods:', food)
OUTPUT:
Foods: ['best burger']
Solution:
Though @Black Thunder's solution works for the example I have provided in the example but it doesn't work for queries like best burgers
. But this solution works in that case. Which is a major concern for me. Thanks @Andrej Kesely. This was the reason I went for the cosine similarity in my solution. But i think SequenceMatcher works better here.
Upvotes: 0
Views: 276
Reputation: 195573
You can use difflib
(doc) to find similarities (It will probably need some tweaking with coefficients):
foods={
"chicken masala" : "curry",
"chicken burger" : "burger",
"beef burger" : "burger",
"chicken soup" : "appetizer",
"vegetable" : "curry"
}
queries = ["best burger", "order"]
from difflib import SequenceMatcher
out = []
for q in queries:
for k in foods:
r = SequenceMatcher(None, k, q).ratio()
print('q={: <20} k={: <20} ratio={}'.format(q, k, r))
if r > 0.5:
out.append(k)
print(out)
Prints:
q=best burger k=chicken masala ratio=0.16
q=best burger k=chicken burger ratio=0.64
q=best burger k=beef burger ratio=0.8181818181818182
q=best burger k=chicken soup ratio=0.2608695652173913
q=best burger k=vegetable ratio=0.3
q=order k=chicken masala ratio=0.10526315789473684
q=order k=chicken burger ratio=0.3157894736842105
q=order k=beef burger ratio=0.375
q=order k=chicken soup ratio=0.11764705882352941
q=order k=vegetable ratio=0.14285714285714285
['chicken burger', 'beef burger']
Upvotes: 1
Reputation: 7313
Try this code:
queries = ["best burger", "order"]
foods={
"chicken masala" : "curry",
"chicken burger" : "burger",
"beef burger" : "burger",
"chicken soup" : "appetizer",
"vegetable" : "curry"
}
output = []
for y in queries: #looping through the queries
for x in y.split(" "): #spliting the data in the queries for matches
for z in foods: #taking the keys (same as foods.keys)
if x in z: #Checking if the data in queries matches any data in the keys
output.append(z) #if matches, appending the data
print(output)
Output:
['chicken burger', 'beef burger']
Upvotes: 1
Reputation: 419
You can do something simple like this
First get all the keys
data = foods.keys()
Now convert list of strings to one single string comma separated. This will be much easier to check for substring matching,
queries = ','.join(queries)
Now check for substring matching
for food in data:
food = food.split()
for item in food:
if item in data:
print True
Upvotes: 0
Reputation: 551
If what you want is a list of matches between queries and foods keys, you could use a list comprehension:
matches = [food for food in queries if food in foods]
Upvotes: -1