Kaushal Shah
Kaushal Shah

Reputation: 85

Data Structure for word suggestions from a text file

Problem: We are given a text file containing many lines of text. Now the user will input a few letters and we have to give an autocomplete suggestion based on the text in the file we are given. Let's say the file contains computer science is fun. computer engineering is awesome. Now if the user typescom we need to give as suggestion computer science and computer engineering. If the user types is the suggestion should be fun and awesome. The user can input any word that might or might not be in the text file. If the word is not in the file there should be no suggestion.

What would be the best data structure for this problem.
I know that we can build a trie but with that we might only be able to suggest computer when the user types com.

Appreciate any help.

Upvotes: 1

Views: 367

Answers (1)

fjardon
fjardon

Reputation: 7996

Preparation:

  1. Read all the lines of the text file as an array of string
  2. Sort lexicographically this array

Query:

  1. Get the lower bound index given the input string: first
  2. Increase the value of the last character of your input string by 1 (if not at its max value) and get the lower bound index, last, for this new input string. If your last character cannot incremented, use the index one past the end of your arrray.

All the possible suggestions are in the sorted array between these two bounds not including the last index: [first, last).

If there are too many suggestions, you can filter by only suggesting the 3 shortest suggestions, or sort by statistical frequencies.

You can also print the number of suggestions instead of suggesting them. Similar to the way google tells you how many pages match your query. And then only suggest strings when the number of matches is manageable by your UI.

Upvotes: 1

Related Questions