Giuc
Giuc

Reputation: 125

Whoosh - performance issues with wildcard searches (*something)

I'm noticing that searches like *something consume huge amounts of cpu. I'm using whoosh 2.4.1. I suppose this is because I don't have indexes covering this search case. something* works fine. *something doesnt't.

How do you deal with these queries? Is there a special way to declare your schemas which makes this kind of queries possible?

Thanks!

Upvotes: 0

Views: 1063

Answers (1)

Thomas Waldmann
Thomas Waldmann

Reputation: 491

That's a quite fundamental problem: prefixes are usually easy to find (like when searching foo*), postfixes are not (like *foo).

Prefixes + Wildcard searches get optimized to first do a fast prefix search and then a slow wildcard search on the results given in the first step.

You can't do that optimization with Wildcard + Postfix. But there is a trick:

If you really need that often, you could try indexing a reversed string (and also searching for the reversed search string), so the postfix search becomes a prefix search:

Somehow like:

add_document(title=title, title_rev=title[::-1])
...
# then query = u"*foo"[::-1], search in title_rev field.

Upvotes: 3

Related Questions