Chris Dutrow
Chris Dutrow

Reputation: 50362

Substring type text search on google app engine

I see that google app engine has now added text search: https://developers.google.com/appengine/docs/python/search/overview

Does this include searching for sub-strings within strings?

The reason I ask is because I had previously written some code that would allow substring search for fields like names and phone numbers. For example, you could search for "San" and it would find results like "Mike DaSantos". This was awesome for stuff like auto-complete.

I ran into problems with cost though because of the tremendous amount of write operations that this required. Each field that I did this for required roughly O((n*n+1)/2) write operations because it involved a write operation for each subset of letters in a string. This added up to a few dollars of app engine costs when it came to indexing phone numbers, names, e-mail addresses, and addresses for 6000 customers.

I'm wondering if using the search API could provide this functionality for less cost?

Thanks so much!

Upvotes: 3

Views: 1422

Answers (2)

Aaron
Aaron

Reputation: 11

By the way, you shouldn't need O((n*n+1)/2) write operations for your own substring search solution.

You should only need 1.

I.e., instead of creating O((n*n+1)/2) objects, you create ONE object with O((n*n+1)/2) list elements in a ndb.StringProperty(repeated=True)

Upvotes: 1

user1258245
user1258245

Reputation: 3639

No it doesn't.

The only "wildcard" we can search with is for plurals.

~"car"  # searches for "car" and "cars"

What it can do though is save multiple tokens in the same field. See their example at TextSearchServlet

  StringTokenizer tokenizer = new StringTokenizer(tagStr, ",");
  while (tokenizer.hasMoreTokens()) {
    docBuilder.addField(Field.newBuilder().setName("tag")
        .setAtom(tokenizer.nextToken()));
  }

So you could query a "nametag" field for example, and assuming you tokenized the name into it get "Mike DaSantos" back

  Results<ScoredDocument> results = getIndex().search("nametag:San"); 

I am not crystal clear on the costs and quotas here.

Upvotes: 4

Related Questions