Reputation: 81
I'm about to start writing a program which will analyze a text and store all the unique words in the text in some form which can later be called upon. When called upon it will give the position of all occurrences of this word in the original text and return the surrounding words as well.
I think the best way to do this would be to use a hashmap because it works with the unique words as a key and then an int[] as the mapped values. But I don't know if this is considered best practice or not. My solution would have one array to store the original text, which might be quite big, and one hashmap with one key-value pair for each unique word which might be almost as large as the array containing the text. How would you solve it?
Upvotes: 0
Views: 54
Reputation: 8589
A hash-maps are made for this kind of task. You should probably map strings to a structure (rather than an int array). That structure might record position and previous and next word - it's not precisely clear what you mean by 'surrounding'.
You may have to decide if your process is case sensitive. Are "You" and "you" the same word? Depending on the language you may be able to provide a case-insensitive comparator and hashing function or need to 'low case' all the entries.
Upvotes: 1
Reputation: 3143
An alternative possibility is a 26-ary tree (considering your alphabet has 26 characters).
Build your tree storing words you encounter, each node will represent a word ; then in each node you can store an array of pointers pointing towards occurrences of the words in the strings (or an array of int representing indexes).
In terms of memory and complexity, it is equivalent to the hash map implementation (same speed, slightly more compact), but it seems a bit more intuitive to me than the hash map.
So I'd say it's mainly up to you and your favorite structures.
Upvotes: 1