Reputation: 9468
I am writing a python web application for which I need to process search queries having named entities in it. For example, If search query is: "mac os lion" And lets say I have to process this query with the candidates available on my database:
We all know that 3rd result is the right result. But is there any way we could map user's query i.e. "mac os lion" to "Apple Mac OS X Lion" (which is the available entry on my database) Can someone please tell me what to look for or what to do.
Upvotes: 3
Views: 321
Reputation: 3284
You need some sort of normalisation of user queries and have to "learn" a mapping from these to the correct "classes".
A simple way would be computing the overlap of "tokens" matching any of your "classes". The following sample code may help:
CLASSES = ['Google Android', 'Microsoft Windows', 'Apple Mac OS X Lion']
def classify_query(query_string):
"""
Computes the most "likely" class for the given query string.
First normalises the query to lower case, then computes the number of
overlapping tokens for each of the possible classes.
The class(es) with the highest overlap are returned as a list.
"""
query_tokens = query_string.lower().split()
class_tokens = [[x.lower() for x in c.split()] for c in CLASSES]
overlap = [0] * len(CLASSES)
for token in query_tokens:
for index in range(len(CLASSES)):
if token in class_tokens[index]:
overlap[index] += 1
sorted_overlap = [(count, index) for index, count in enumerate(overlap)]
sorted_overlap.sort()
sorted_overlap.reverse()
best_count = sorted_overlap[0][0]
best_classes = []
for count, index in sorted_overlap:
if count == best_count:
best_classes.append(CLASSES[index])
else:
break
return best_classes
Example output
classify_query('mac OS x') -> ['Apple Mac OS X Lion']
classify_query('Google') -> ['Google Android']
Of course, this is only a very basic solution. You might want to add some spell checking to be more robust in case of typos in query strings...
Hope that helps :)
Upvotes: 3
Reputation: 11381
If you only need to find similar text to the query, you could use a text search engine with python binding such as Lucene + PyLucene.
Upvotes: 1