Reputation: 776
Using the GAE search API is it possible to search for a partial match?
I'm trying to create autocomplete functionality where the term would be a partial word. eg.
> b
> bui
> build
would all return "building".
How is this possible with GAE?
Upvotes: 13
Views: 6931
Reputation: 55
Jumping in very late here.
But here is my well documented function that does tokenizing. The docstring should help you understand it well and use it. Good luck!!!
def tokenize(string_to_tokenize, token_min_length=2):
"""Tokenizes a given string.
Note: If a word in the string to tokenize is less then
the minimum length of the token, then the word is added to the list
of tokens and skipped from further processing.
Avoids duplicate tokens by using a set to save the tokens.
Example usage:
tokens = tokenize('pack my box', 3)
Args:
string_to_tokenize: str, the string we need to tokenize.
Example: 'pack my box'.
min_length: int, the minimum length we want for a token.
Example: 3.
Returns:
set, containng the tokenized strings. Example: set(['box', 'pac', 'my',
'pack'])
"""
tokens = set()
token_min_length = token_min_length or 1
for word in string_to_tokenize.split(' '):
if len(word) <= token_min_length:
tokens.add(word)
else:
for i in range(token_min_length, len(word) + 1):
tokens.add(word[:i])
return tokens
Upvotes: 0
Reputation: 6290
Though LIKE statement (partial match) is not supported in Full Text Search, but you could hack around it.
First, tokenize the data string for all possible substrings (hello = h, he, hel, lo, etc.)
def tokenize_autocomplete(phrase):
a = []
for word in phrase.split():
j = 1
while True:
for i in range(len(word) - j + 1):
a.append(word[i:i + j])
if j == len(word):
break
j += 1
return a
Build an index + document (Search API) using the tokenized strings
index = search.Index(name='item_autocomplete')
for item in items: # item = ndb.model
name = ','.join(tokenize_autocomplete(item.name))
document = search.Document(
doc_id=item.key.urlsafe(),
fields=[search.TextField(name='name', value=name)])
index.put(document)
Perform search, and walah!
results = search.Index(name="item_autocomplete").search("name:elo")
https://code.luasoftware.com/tutorials/google-app-engine/partial-search-on-gae-with-search-api/
Upvotes: 31
Reputation: 331
My version optimized: not repeat tokens
def tokenization(text):
a = []
min = 3
words = text.split()
for word in words:
if len(word) > min:
for i in range(min, len(word)):
token = word[0:i]
if token not in a:
a.append(token)
return a
Upvotes: 0
Reputation: 1088
just like @Desmond Lua answer, but with different tokenize function:
def tokenize(word): token=[] words = word.split(' ') for word in words: for i in range(len(word)): if i==0: continue w = word[i] if i==1: token+=[word[0]+w] continue token+=[token[-1:][0]+w] return ",".join(token)
it will parse hello world
as he,hel,hell,hello,wo,wor,worl,world
.
it's good for light autocomplete purpose
Upvotes: 3
Reputation: 5336
I have same problem for typeahead control, and my solution is parse string to small part :
name='hello world'
name_search = ' '.join([name[:i] for i in xrange(2, len(name)+1)])
print name_search;
# -> he hel hell hello hello hello w hello wo hello wor hello worl hello world
Hope this help
Upvotes: 0
Reputation: 3115
As described at Full Text Search and LIKE statement, no it's not possible, since the Search API implements full text indexing.
Hope this helps!
Upvotes: 2