James Warner
James Warner

Reputation: 147

C++ How would I make this code more efficient?

I have an array of words, and I have a text file. What I want to do is use the array of words and search through the text file, count the number of times each word in the array appears in the text file.

I have thought about using a For Loop but that just gave me the total of the word count not the individual word count for each. I can't put the text file into an array as there is about 40000 words in the text file.

After the count, I want to divide each count by a integer value known as 'scale'. And then mulitply a string by the new count number.

So I am currently doing it as shown below. Is there anyway I can make this more efficient?

Any help is greatly appreciated.

Array of words = testwords.

Name of file = testF.

inWord = each word in the file.

while(testF >> inWord)
    {if (inWord == testwords[0]){
            count1++;
            }
        if (inWord == testwords[1]){
            count2++;
            }
        if (inWord == testwords[2]){
            count3++;
            }
        if (inWord == testwords[3]){
            count4++;
            }
        if (inWord == testwords[4]){
            count5++;
            }
        if (inWord == testwords[5]){
            count6++;
            }
        if (inWord == testwords[6]){
            count7++;
            }
        if (inWord == testwords[7]){
            count8++;
            }
}
cout << testwords[0] << " " << count1 << " " << s1.append(count1/scale, '*') << endl;
cout << testwords[1] << " " << count2 << " " << s2.append(count2/scale, '*') << endl;
cout << testwords[2] << " " << count3 << " " << s3.append(count3/scale, '*') << endl;
cout << testwords[3] << " " << count4 << " " << s4.append(count4/scale, '*') << endl;
cout << testwords[4] << " " << count5 << " " << s5.append(count5/scale, '*') << endl;
cout << testwords[5] << " " << count6 << " " << s6.append(count6/scale, '*') << endl;
cout << testwords[6] << " " << count7 << " " << s7.append(count7/scale, '*') << endl;
cout << testwords[7] << " " << count8 << " " << s8.append(count8/scale, '*') << endl;

Upvotes: 2

Views: 185

Answers (4)

EvilTeach
EvilTeach

Reputation: 28882

All of the other answers here are very good suggestions. One small optimization you could make is to use else in your existing code.

if (inWord == testwords[0])
{
    count1++;
}
if (inWord == testwords[1])
{
    count2++;
}

could be replaced by

if (inWord == testwords[0])
{
    count1++;
}
else if (inWord == testwords[1])
{
    count2++;
}

The concept is, that if inWord does match element 0, it is unlikely to match any of the other elements.

In any case Profilers are you friend.

Upvotes: 0

Aki Suihkonen
Aki Suihkonen

Reputation: 20087

With only 8 values to compare, you can most likely find a better hash algorithm, than in std. It may only consists of the first two characters, or the last character, or the string lenght:

while (std::cin >> word) {
  int i=my_hash(word);
  if (word==my_sparse_hash_table[i].word) my_sparse_hash_table[i].count++;
}

Just using your method:

while (std::cin >> word) {
   for (int i=0;i<N;i++) 
     if (word == myTable[i].word) { myTable[i].count++; break; }
}  // earlies break out of the loop

micro-optimizations include moving a found entry towards the beginning of the array myTable.

Upvotes: 0

juanchopanza
juanchopanza

Reputation: 227608

Set up an std::map<std::string, unsigned long long>, scan through the document word by word, and increment the counter for each word:

std::map<std::string, unsigned long long> wordMap;

std::string word; // read words into this string
...
wordMap[word]++; // increase counter each time a word is found. First call will insert 0.

Then you can loop over your array of words, checking the entries in the map:

for (unsigned int i = 0; i < nWords; ++i)
{
  std::cout << "Word " << testWords[i] << " was found " << wordMap[testWords[i]] << " times\n";
}

Each time a new word is found, myMap[word] will insert a key-value pair word : 0.

If you have c++11, you can try with an std::unordered_map and pick the one that performs best.

Upvotes: 1

Corbin
Corbin

Reputation: 33467

Before you worry about efficiency, you should worry about approach. You're not using logical data structures. Instead of having 8 separate counts, keep an array of counts. Or better yet, keep a map of word -> count.

Lucky in this situation, cleaner code will correspond to much faster execution.

In particular, use an std::map<std::string, size_t>.

Alternatively, if you're using C++11, you could use a std::unordered_map for likely better performance.

Assuming you're reading your words from cin:

std::map<std::string, size_t> counts;

std::string word;

while (std::cin >> word) {
    ++counts[word];
}

for (std::map<std::string, size_t::const_iterator it = counts.begin(),
     end = counts.end(); it != end; ++it) {
    std::cout << "The word '" << it->first << " appeared " 
              << it->second << " times" << std::endl;
}

Documentation for std::map.

Documentation for std::unordered_map.

For what it's worth, std::unordered_map is (pretty assumably always) implemented as a hash map, and std::map is implemented (pretty assumably always) using a balanced binary tree as the backing structure.

Upvotes: 4

Related Questions