Reputation: 10536
How can I use a trie (or another data structure or algorithm) to efficiently search for multiple words by prefix?
For example: suppose this is my data set:
A trie data structure allows me to efficiently retrieve all names starting with "Bo" (thus without iterating over all the names). But I also want to search on the last name by prefix, thus searching for "Wa" should find "Bobby Walker". And to complicate things: when the user searches for "Bo Wa" this should also find the same name. How can I implement this? Should I use a separate trie structure for each part of the name? (And how to combine the results)?
Background: I'm writing the search functionality for a big address book (10000+ names). I want to have a really fast autocomplete function that shows results while people are typing the first few letters of the first & last name. I already have a solution that uses a regex, but it needs to iterate over all names which is to slow.
Upvotes: 5
Views: 2772
Reputation: 927
A very good data structure would be a Burst Trie
There's a Scala implementation.
Upvotes: 3
Reputation: 12592
You could try a second trie with the reversed string and a wildcard search:http://phpir.com/tries-and-wildcards/
Upvotes: 2
Reputation: 2643
I think that a sorted array will also fit for your requirements, an array containing Person
objects (they have a firstName
and a lastName
field). Let's say that you have a prefix
and want to find all the values that fit your prefix
. Simply run a binary search to find the first position (let's say is firstIndex
) where your prefix
appears on firstName
and one more to find the last position (lastIndex
). Now you can retrieve your values in O(lastIndex - firstIndex)
. The same goes when you want to find them by lastName
. When you have a prefixFirstName
and a prefixLastName
you can search for the interval where values match for prefixFirstName
and then, on this interval, you can check for the values matching the prefixLastName
. For a conclusion, when you have one or two prefixes you run 4 binary searches (around 17 iterations per search for 100k names) which is fast enough and you can retrieve them in linear time. Even if it isn't the fastest solution, I suggested it since it's easy to understand and easy to code.
Upvotes: 1