Arun Kumar
Arun Kumar

Reputation: 694

How can the runtime efficiency of this function be improved?

I am trying to write optimal code (good runtime efficiency) for removing duplicate words in a sentence.

For example the input string Jack Juliet Juliet Jill Jack Romeo Jack Jill to the function should return Jack Juliet Jill Romeo

The following is my code :

std::string removeDuplicateWords(const std::string& str)
{
    std::stringstream ss (str);
    std::unordered_set<std::string> string_history;
    std::string current_string, output_string;
    ss >> current_string;
    string_history.insert(current_string);
    output_string += current_string;
    while(ss >> current_string) {
        if(string_history.find(current_string) == string_history.end()) {
            output_string += " " + current_string;
            string_history.insert(current_string);
        }
    }
    return output_string;
}

Upvotes: 0

Views: 91

Answers (2)

calynr
calynr

Reputation: 1272

Improvements:

  • You don't need to check whether the string exists because unordered_set does it for you.
  • current_string is moved, because we no longer need it.

    std::string removeDuplicateWordsImproved( const std::string& str )
    {
        std::stringstream ss (str);
        std::unordered_set<std::string> string_history;
        std::string current_string, output_string;
    
        while( ss >> current_string )
            string_history.insert( std::move( current_string ) );
    
        for ( auto& word : string_history )
            output_string.append( std::move( word ) ).push_back( ' ' );
    
        output_string.erase( output_string.size() - 1 );
    
        return output_string;
    }
    

Upvotes: 0

JVApen
JVApen

Reputation: 11317

For this example, gaining performance doesn't make much sense, if you only have about 10 words to process, use the easier code. However, there are some things you can do to improve performance.

First of all, you don't want to check check if an element is in the set. insert returns you a pair of iterator-bool, where the first is the iterator to the element with the key (either existing or new) and the second is a Boolean that indicates if an insert happened.

      if (string_history.insert(current_string).second)
          output_string += " " + current_string;

This both simplifies code and improves performance. As already indicated in comments, usingstd::move makes sense, however, if you use it with insert, you need the iterator to get a hold of the moved to object. For this case, it won't help, as you are in the SSO cases.

A second thing that requires more code, is to remove std::stringstream. Streams give some overhead. Instead, you can better accept a std::string_view and use substring logic to cut it into pieces. This prevents the creation of new string objects and removes overhead from streams. You do need C++17 (or a library providing this).

Finally, you can reserve output_string at the beginning, you know the maximum size (aka size of str). This can prevent reallocations if you end up out of the SSO range.

Upvotes: 1

Related Questions