Reputation: 5407
I am trying to process content of article or any paragraph [each string]. First I will convert into words using strtok()
.
After that I want store each word in hashtable (because I think it's only best way to process big data). While dealing with each word, I want to store occurrence of each words. And at the end I want to get words which occurring max time.
unordered_map
stores elements with key values and allows fast retrieval of elements with key. This may be useful to me.
I am not good with C++, so want some opinions.
Storing entire contains in char *ch ="content of article"
is good way to proceed or string::str? I am familiar with first one only. for 2nd I feel complex during working with functions.
Storing entire content(strings) into unordered_map()
, then How can I create hashtable which contains element as words, and it's occurrence with it. And then Can I get words with max occurance?
Is there any other C++ function which can help me out to do what I want.
Upvotes: 0
Views: 3844
Reputation: 263320
I want store each word in hashtable (because I think it's only best way to process big data). While dealing with each word, I want to store occurrence of each words.
Here is some pseudo C++ to get you started:
std::unordered_map<std::string, int> occurrences;
while (more_words_available)
{
std::string word = fetch_next_word();
++occurrences[word];
}
how I print occurrence count values for each word in while?
Do you have a C++11 compiler? Then use the new foreach loop:
for (auto p : occurrences)
{
std::cout << p.first << " occurred " << p.second << " times.\n";
}
Otherwise, use the traditional for loop with iterators:
for (std::unordered_map<std::string, int>::iterator it = occurrences.begin();
it != occurrences.end();
++it)
{
std::cout << it->first << " occurred " << it->second << " times.\n";
}
Upvotes: 2
Reputation: 3801
If your article is in file test.txt
then you may create your map like that:
#include<fstream>
#include<map>
#include<string>
using namespace std;
int main()
{
ifstream in_file("test.txt");
map<string,int> words;
string tword;
while(in_file >> tword) //line 12
words[tword]++;
}
You can also store entire content in istringstream ss
and use it instead of in_file
above:
while(ss >> twords) //line 12
Upvotes: 1
Reputation: 154027
You don't need (nor want) strtok
. If white space is the
separator for words, just reading into a string using >>
will
do the trick; the entire input phase would be:
std::unordered_map<std::string, int> counts;
std::string word;
while ( source >> word ) {
++ counts[word];
}
Depending on the requirements, you might want to do things like
converting the word to lower case before counting it, or
stripping final punctuation from it (so that word
, Word
and
Word.
are all the same).
For accessing sorted by count, the simplest would be to copy the
contents of the map into an std::vector<std::pair<std::string,
int>>
and sorting that. (Don't forget that you can construct
a vector from two iterators. So this is just two more lines.)
Upvotes: 1
Reputation: 2505
It seems the data structure you need will need to do several operations: looking up by key (word) and string values (count) for each key. But you also want to be able to print the frequent works, in which case you needs sorting by value.
None of the standard containers handle this out of the box. Since the first operation will be happening frequently, and the second just once, you should select container that suites best for the first operation.
Both std::map and std::unordered_map would do well.
Try following:
std::map<std::string, int>
or
unordered_map std::map<std::string, int>
For printing all the works in order of frequency, you will have to copy it into another structure and then sort it. Or it in in one operation. You can copy everything into
std::map<int, std::string>
and then just print it.
Upvotes: 3
Reputation: 151
Upvotes: 1