numegil
numegil

Reputation: 1926

Efficiently searching large list of strings

I have a large list of strings which needs to be searched by the user of an iPhone/Android app. The strings are sorted alphabetically, but that isn't actually all that useful, since a string should be included in the results if the search query falls anywhere inside the string, not just the beginning. As the user is typing their search query, the search should be updated to reflect the results of what they currently entered. (e.g. if they type "cat", it should display the results for "c", "ca" and "cat" as they type it).

My current approach is the following:

I have a stack of "search results", which starts out empty. If the user types something to make the search query longer, I push the current search results onto the stack, then search through only the current search results for the new ones (it's impossible for something to be in the full string list but not the current results in this case).

If the user hits backspace, I only need to pop the search results off of the stack and restore them. This can be done nearly instantaneously.

This approach is working great for "backwards" searching (making the search query shorter) and cases where the search query is already long enough for the number of results to be low. However, it still has to search though the full list of strings in O(n) time for each of the first few letters the user types, which is quite slow.

One approach I've considered is to have a pre-compiled list of results of all possible search queries of 2 or 3 letters. The problem with this approach is that it would require 26^2 or 26^3 such lists, and would take up a pretty large amount of space.

Any other optimizations or alternate approaches you can think of?

Upvotes: 3

Views: 1701

Answers (3)

Rrr
Rrr

Reputation: 1777

You could use Compressed suffix tree

Upvotes: 1

user845279
user845279

Reputation: 2804

You should consider using a prefix tree (trie) to make a precomputed list. I'm not sure showing the result for 'c', 'ca', and 'cat' on a sub-character basis is a good idea. For example, let's say the user is searching for the word 'eat'. Your algorithm would have to find all the words that contain 'e', then 'ea', and finally 'eat'; most of which will be of no use for the user. For a phone app, it would probably be better if you do it on a word basis. A multi-word string can be tokenized so searching for 'stake' in 'a large stake' will work fine, but not searching for 'take'.

Upvotes: 4

Dan Nissenbaum
Dan Nissenbaum

Reputation: 13938

I notice that Google, and I imagine others, do not provide a full list when only 1 or 2 characters have been pressed. In your case, perhaps a good starting point is to only begin populating the search query results when the user has typed a minimum of 3 characters.

For later versions, if it's important, you can take a cue from the way Google does it, and do more sophisticated processing: keep track of the actual entries that previous users have selected, and order these by frequency. Then, run a cron job every day on your server to populate a small database table with the top 10 entries starting with each letter, and if only 1 or 2 letters have been pressed, use the results from this small table instead of scanning the full list.

Upvotes: 1

Related Questions