Reputation: 2531

How do I modify the prefix trie data structure to handle words that are in the middle?

I want to implement simple autocomplete functionality for a website. I first wanted to use the prefix trie data structure for this, and that's how autocomplete usually works, you enter a prefix and you can search the trie for the possible suffixes, however the product owner wants to handle the words that are in the middle as well.

Let me explain what I mean. Imagine I have these product names:

tile for bathroom
tile for living room
kitchen tile
kitchen tile, black
some other tile, green

The user searches for "tile" and they will only see the first 2 results if I use the prefix trie, but I want all those results to pop up, however I don't know about any efficient data structure to handle this. Can you please suggest something? Can a prefix trie be modified to handle this?

I have thought about some modifications, such as inserting all suffixes, etc, but they will give wrong results, for example, I have inserted suffixes for

kitchen tile, black
some other tile, green

and kept the prefixes in the first node for each suffix (kind of like cartesian product), that way I can get the result of "some other tile, black" which doesn't exist. So this solution is bad. Also this solution will use a lot of memory...

Upvotes: 0

Answers (3)

Jim Mischel

Reputation: 133995

Don't over-think this. Computers are fast. If you're talking on the order of thousands of products in memory, then a sequential search doing a contains check is going to be plenty fast enough: just a few milliseconds, if that.

If you're talking a high-traffic site with thousands of requests per second, or a system with hundreds of thousands of different products, you'll need a better approach. But for a low-traffic site and a few thousand products, do the simple thing first. It's easy to implement and easy to prove correct. Then, if it's not fast enough, you can worry about optimizing it.

Upvotes: 2

arenard

Reputation: 672

The trie data structure indeed works for prefix match operations, not for in the middle text search

The usual data structure to support in the middle text search is the suffix tree: https://en.wikipedia.org/wiki/Suffix_tree

It requires enough space to store about 20 times your list of words in memory, so yes it costs more memory

Suffix array is a space efficient alternative: https://en.wikipedia.org/wiki/Suffix_array

Upvotes: 2

zenwraight

Reputation: 2000

I have an approach that will work using simple tries.

Assumption:- User will see Sentence once the whole word is complete

Let's take above example to understand this approach.

1. Take each sentence, say tile for bathroom. 
2. Split the sentences into words as - tile, for, bathroom.
3. Create a tuple of [String, String], so for above example we will get three tuples.
  (i)  [tile, tile for bathroom]
  (ii) [for, tile for bathroom]
  (iii)[bathroom, tile for bathroom]
4. Now insert the first String of the above tuple into your trie and store the 
other tuple (which is the whole sentence) as a String object to the 
last character node for the word. i.e when inserting tile, the node at 
character e will store the sentence string value.
5. One case to handle here is, like the tile word appears in two strings, so in that case
the last character e will store a List of string having values - tile for bathroom and tile for living room.

Once you have the trie ready based on the above approach, you would be able to search the sentence based on any word being used in that sentence. In short, we are creating each word of the sentence as tag.

Let me know, if you need more clarity on the above approach.

Hope this helps!

Upvotes: 1

How do I modify the prefix trie data structure to handle words that are in the middle?

Answers (3)

Related Questions