user579674
user579674

Reputation: 2179

What is the best data structure to store words found in a document and a counter with their occurences?

Let's say I have a corpus of documents which I want to read one by one and store them in a data structure. The structure will probably be a list of something. That something class will define a single document. Inside that class I'll have to use a data structure to store the contents from each document, what that should be? Also, if I want to count occurrences of words and retrieve the most frequent words in each document, will I have to use a data structure that will allow me to do this in time < O(n) that would take to examine all the contents sequentially?

Upvotes: 1

Views: 4441

Answers (1)

Boris Pavlović
Boris Pavlović

Reputation: 64632

Use an associative array, also called map or dictionary since different programming languages use different terms for the same data structure.

Every entry key would be a word and the counter would be the value of the entry. For example

{
  'on' -> 15,
  'and' -> 43,
  'I' -> 157,
  'confluence' -> 1,
  'dear' -> 2
}

Upvotes: 2

Related Questions