Reputation: 694
I am trying to write optimal code (good runtime efficiency) for removing duplicate words in a sentence.
For example the input string Jack Juliet Juliet Jill Jack Romeo Jack Jill
to the function should return Jack Juliet Jill Romeo
The following is my code :
std::string removeDuplicateWords(const std::string& str)
{
std::stringstream ss (str);
std::unordered_set<std::string> string_history;
std::string current_string, output_string;
ss >> current_string;
string_history.insert(current_string);
output_string += current_string;
while(ss >> current_string) {
if(string_history.find(current_string) == string_history.end()) {
output_string += " " + current_string;
string_history.insert(current_string);
}
}
return output_string;
}
Upvotes: 0
Views: 91
Reputation: 1272
Improvements:
current_string is moved, because we no longer need it.
std::string removeDuplicateWordsImproved( const std::string& str )
{
std::stringstream ss (str);
std::unordered_set<std::string> string_history;
std::string current_string, output_string;
while( ss >> current_string )
string_history.insert( std::move( current_string ) );
for ( auto& word : string_history )
output_string.append( std::move( word ) ).push_back( ' ' );
output_string.erase( output_string.size() - 1 );
return output_string;
}
Upvotes: 0
Reputation: 11317
For this example, gaining performance doesn't make much sense, if you only have about 10 words to process, use the easier code. However, there are some things you can do to improve performance.
First of all, you don't want to check check if an element is in the set. insert
returns you a pair of iterator-bool, where the first is the iterator to the element with the key (either existing or new) and the second is a Boolean that indicates if an insert happened.
if (string_history.insert(current_string).second)
output_string += " " + current_string;
This both simplifies code and improves performance. As already indicated in comments, usingstd::move
makes sense, however, if you use it with insert, you need the iterator to get a hold of the moved to object.
For this case, it won't help, as you are in the SSO cases.
A second thing that requires more code, is to remove std::stringstream
. Streams give some overhead. Instead, you can better accept a std::string_view
and use substring logic to cut it into pieces. This prevents the creation of new string objects and removes overhead from streams. You do need C++17 (or a library providing this).
Finally, you can reserve output_string
at the beginning, you know the maximum size (aka size of str). This can prevent reallocations if you end up out of the SSO range.
Upvotes: 1