Reputation: 147
I have an array of words, and I have a text file. What I want to do is use the array of words and search through the text file, count the number of times each word in the array appears in the text file.
I have thought about using a For Loop but that just gave me the total of the word count not the individual word count for each. I can't put the text file into an array as there is about 40000 words in the text file.
After the count, I want to divide each count by a integer value known as 'scale'. And then mulitply a string by the new count number.
So I am currently doing it as shown below. Is there anyway I can make this more efficient?
Any help is greatly appreciated.
Array of words = testwords.
Name of file = testF.
inWord = each word in the file.
while(testF >> inWord)
{if (inWord == testwords[0]){
count1++;
}
if (inWord == testwords[1]){
count2++;
}
if (inWord == testwords[2]){
count3++;
}
if (inWord == testwords[3]){
count4++;
}
if (inWord == testwords[4]){
count5++;
}
if (inWord == testwords[5]){
count6++;
}
if (inWord == testwords[6]){
count7++;
}
if (inWord == testwords[7]){
count8++;
}
}
cout << testwords[0] << " " << count1 << " " << s1.append(count1/scale, '*') << endl;
cout << testwords[1] << " " << count2 << " " << s2.append(count2/scale, '*') << endl;
cout << testwords[2] << " " << count3 << " " << s3.append(count3/scale, '*') << endl;
cout << testwords[3] << " " << count4 << " " << s4.append(count4/scale, '*') << endl;
cout << testwords[4] << " " << count5 << " " << s5.append(count5/scale, '*') << endl;
cout << testwords[5] << " " << count6 << " " << s6.append(count6/scale, '*') << endl;
cout << testwords[6] << " " << count7 << " " << s7.append(count7/scale, '*') << endl;
cout << testwords[7] << " " << count8 << " " << s8.append(count8/scale, '*') << endl;
Upvotes: 2
Views: 185
Reputation: 28882
All of the other answers here are very good suggestions. One small optimization you could make is to use else in your existing code.
if (inWord == testwords[0])
{
count1++;
}
if (inWord == testwords[1])
{
count2++;
}
could be replaced by
if (inWord == testwords[0])
{
count1++;
}
else if (inWord == testwords[1])
{
count2++;
}
The concept is, that if inWord does match element 0, it is unlikely to match any of the other elements.
In any case Profilers are you friend.
Upvotes: 0
Reputation: 20087
With only 8 values to compare, you can most likely find a better hash algorithm, than in std. It may only consists of the first two characters, or the last character, or the string lenght:
while (std::cin >> word) {
int i=my_hash(word);
if (word==my_sparse_hash_table[i].word) my_sparse_hash_table[i].count++;
}
Just using your method:
while (std::cin >> word) {
for (int i=0;i<N;i++)
if (word == myTable[i].word) { myTable[i].count++; break; }
} // earlies break out of the loop
micro-optimizations include moving a found entry towards the beginning of the array myTable.
Upvotes: 0
Reputation: 227608
Set up an std::map<std::string, unsigned long long>
, scan through the document word by word, and increment the counter for each word:
std::map<std::string, unsigned long long> wordMap;
std::string word; // read words into this string
...
wordMap[word]++; // increase counter each time a word is found. First call will insert 0.
Then you can loop over your array of words, checking the entries in the map:
for (unsigned int i = 0; i < nWords; ++i)
{
std::cout << "Word " << testWords[i] << " was found " << wordMap[testWords[i]] << " times\n";
}
Each time a new word is found, myMap[word]
will insert a key-value pair word : 0
.
If you have c++11, you can try with an std::unordered_map
and pick the one that performs best.
Upvotes: 1
Reputation: 33467
Before you worry about efficiency, you should worry about approach. You're not using logical data structures. Instead of having 8 separate counts, keep an array of counts. Or better yet, keep a map of word -> count.
Lucky in this situation, cleaner code will correspond to much faster execution.
In particular, use an std::map<std::string, size_t>
.
Alternatively, if you're using C++11, you could use a std::unordered_map for likely better performance.
Assuming you're reading your words from cin
:
std::map<std::string, size_t> counts;
std::string word;
while (std::cin >> word) {
++counts[word];
}
for (std::map<std::string, size_t::const_iterator it = counts.begin(),
end = counts.end(); it != end; ++it) {
std::cout << "The word '" << it->first << " appeared "
<< it->second << " times" << std::endl;
}
Documentation for std::map.
Documentation for std::unordered_map.
For what it's worth, std::unordered_map is (pretty assumably always) implemented as a hash map, and std::map is implemented (pretty assumably always) using a balanced binary tree as the backing structure.
Upvotes: 4