What would be the best (practice) way to store data about occurrences and positions of words in a text so that it's quickly accessible?

Question

I'm about to start writing a program which will analyze a text and store all the unique words in the text in some form which can later be called upon. When called upon it will give the position of all occurrences of this word in the original text and return the surrounding words as well.

I think the best way to do this would be to use a hashmap because it works with the unique words as a key and then an int[] as the mapped values. But I don't know if this is considered best practice or not. My solution would have one array to store the original text, which might be quite big, and one hashmap with one key-value pair for each unique word which might be almost as large as the array containing the text. How would you solve it?

m.raynal · Accepted Answer

An alternative possibility is a 26-ary tree (considering your alphabet has 26 characters).
Build your tree storing words you encounter, each node will represent a word ; then in each node you can store an array of pointers pointing towards occurrences of the words in the strings (or an array of int representing indexes).
In terms of memory and complexity, it is equivalent to the hash map implementation (same speed, slightly more compact), but it seems a bit more intuitive to me than the hash map.
So I'd say it's mainly up to you and your favorite structures.

What would be the best (practice) way to store data about occurrences and positions of words in a text so that it's quickly accessible?

Answers (2)

Related Questions

What would be the best (practice) way to store data about occurrences and positions of words in a text so that it&#39;s quickly accessible?

Answers (2)

Related Questions

What would be the best (practice) way to store data about occurrences and positions of words in a text so that it's quickly accessible?