molita
molita

Reputation: 145

How to efficiently remove double quotes from std::string if they exist

This question risks being a duplicate e.g. remove double quotes from a string in c++ but none of the asnwers that I saw addresses my question
I have a list of strings, some of which are double quoted and some aren't, Quotes are always at beginning and end

std::vector<std::string> words = boost::assign::list_of("words")( "\"some\"")( "of which")( "\"might\"")("be quoted");

I am looking for the most efficient way to remove the quotes. Here is my attempt

for(std::vector<std::string>::iterator pos = words.begin(); pos != words.end(); ++pos)
{
  boost::algorithm::replace_first(*pos, "\"", "");
  boost::algorithm::replace_last(*pos, "\"", "");
  cout << *pos << endl;
}

Can I do better than this? I have potentially hundreds of thousands of string to process.They may come from a file or from a database. The std::vector in the example is just for illustration purposes.

Upvotes: 12

Views: 19744

Answers (4)

Andreas Spindler
Andreas Spindler

Reputation: 8120

The most efficient way for modern C++ is:

  if (str.size() > 1) {
    if (str.front() == '"' && str.back() == '"') {
      if (str.size() == 2) {
        str.erase();
      } else {
        str.erase(str.begin());
        str.erase(str.end() - 1);
      }
    }
  }

Rationale:

  • The erase() function modifies the string instead of reallocating it.
  • Calling front() on empty strings triggers undefined behavior.
  • This code is open to the possibility that the compiler deduces the intention of the two erase calls and optimize the code further (removing the first and last char together is a standard problem).

Upvotes: 1

Seth Carnegie
Seth Carnegie

Reputation: 75130

It would probably be fast to do a check:

for (auto i = words.begin(); i != words.end(); ++i)
    if (*(i->begin()) == '"')
        if (*(i->rbegin()) == '"')
            *i = i->substr(1, i->length() - 2);
        else
            *i = i->substr(1, i->length() - 1);
    else if (*(i->rbegin()) == '"')
        *i = i->substr(0, i->length() - 1);

It might not be the prettiest thing ever, but it's O(n) with a small constant.

Upvotes: 5

uesp
uesp

Reputation: 6204

This is how I would approach the situation:

  • Start Simple: Begin with the simplest approach that does the job, like Potatoswatter's answer.
  • Don't Store Quoted Strings: If you can help it, don't store quoted strings at all. Check and unquote strings where ever you are creating the std::vector<std::string> in the first place. If you are simply receiving a std::vector<std::string> there isn't too much you can do as removing the first quote will require copying the rest of the string.
  • Profile/Benchmark: You may be surprised how fast a few 100000 strings can be iterated through and how little any amount of micro-optimizing will get you in the end. There will always be some cases where you do need every little bit of speed but make sure understand how to achieve the biggest gains (which profiling will tell you).
  • Worst Case: If you absolutely have to prevent copying the entire string when unquoting then store an index/iterator to the first "real" character. This may actually be slower with "short" strings but may work with "long" strings (i.e., megabytes in size). You could also create, or find, a string class that handles moving the string start without copying but this would be my last choice.

Upvotes: -3

Potatoswatter
Potatoswatter

Reputation: 137800

If you know the quotes will always appear in the first and last positions, you can do simply

if ( s.front() == '"' ) {
    s.erase( 0, 1 ); // erase the first character
    s.erase( s.size() - 1 ); // erase the last character
}

The complexity is still linear in the size of the string. You cannot insert or remove from the beginning of a std::string in O(1) time. If it is acceptable to replace the character with a space, then do that.

Upvotes: 26

Related Questions