Joe
Joe

Reputation: 346

Remove whitespace from string excluding parts between pairs of " and ' C++

So essentially what I want to do is erase all the whitespace from an std::string object, however excluding parts within speech marks and quote marks (so basically strings), eg:

Hello, World! I am a string

Would result in:

Hello,World!Iamastring

However things within speech marks/quote marks would be ignored:

"Hello, World!" I am a string

Would result in:

"Hello, World!"Iamastring

Or:

Hello,' World! I' am a string

Would be:

Hello,' World! I'amastring

Is there a simple routine to perform this to a string, either one build into the standard library or an example of how to write my own? It doesn't have to be the most efficient one possible, as it will only be run once or twice every time the program runs.

Upvotes: 2

Views: 1204

Answers (4)

MORTAL
MORTAL

Reputation: 383

you may use erase-remove idiom like this

#include <string>
#include <iostream>
#include <algorithm>


int main()
{
    std::string str("\"Hello, World!\" I am a string");

    std::size_t x = str.find_last_of("\"");
    std::string split1 = str.substr(0, ++x);
    std::string split2 = str.substr(x, str.size());

    split1.erase(std::remove(split1.begin(), split1.end(), '\\'), split1.end());

    split2.erase(std::remove(split2.begin(), split2.end(), ' '), split2.end());

    std::cout << split1 + split2;
}

Upvotes: 1

Joe
Joe

Reputation: 346

Here we go. I ended up iterating through the string, and if it finds either a " or a ', it will flip the ignore flag. If the ignore flag is true and the current character is not a " or a ', the iterator just increments until it either reaches the end of the string or finds another "/'. If the ignore flag is false, it will remove the current character if it's whitespace (either space, newline or tab).

EDIT: this code now supports ignoring escaped characters (\", \') and making sure a string starting with a " ends with a ", and a string starting with a ' ends with a ', ignoring anything else in between.

#include <iostream>
#include <string>

int main() {
    std::string str("I am some code, with \"A string here\", but not here\\\". 'This sentence \" should not end yet', now it should. There is also 'a string here' too.\n");
    std::string::iterator endVal = str.end(); // a kind of NULL pointer
    std::string::iterator type = endVal;      // either " or '
    bool ignore = false; // whether to ignore the current character or not
    for (std::string::iterator it=str.begin(); it!=str.end();)
    {
        // ignore escaped characters
        if ((*it) == '\\')
        {
            it += 2;
        }
        else
        {
            if ((*it) == '"' || (*it) == '\'')
            {
                if (ignore) // within a string
                {
                    if (type != endVal && (*it) == (*type))
                    {
                        // end of the string
                        ignore = false;
                        type = endVal;
                    }
                }
                else // outside of a string, so one must be starting.
                {
                    type = it;
                    ignore = true;
                }
                it++;
                //ignore ? ignore = false : ignore = true;
                //type = it;
            }
            else
            {
                if (!ignore)
                {
                    if ((*it) == ' ' || (*it) == '\n' || (*it) == '\t')
                    {
                        it = str.erase(it);
                    }
                    else
                    {
                        it++;
                    }
                }
                else
                {
                    it++;
                }
            }
        }
    }
    std::cout << "string now is: " << str << std::endl;
    return 0;
}

Upvotes: 3

D&#250;thomhas
D&#250;thomhas

Reputation: 10083

Argh, and here I spent time writing this (simple) version:

#include <cctype>
#include <ciso646>
#include <iostream>
#include <string>

template <typename Predicate>
std::string remove_unquoted_chars( const std::string& s, Predicate p )
{
  bool skip = false;
  char q = '\0';
  std::string result;

  for (char c : s)
    if (skip) 
    {
      result.append( 1, c );
      skip = false;
    }
    else if (q)
    {
      result.append( 1, c );
      skip = (c == '\\');
      if (c == q) q = '\0';
    }
    else 
    {
      if (!std::isspace( c )) 
        result.append( 1, c );
      q = p( c ) ? c : '\0';
    }

  return result;
}

std::string remove_unquoted_whitespace( const std::string& s )
{
  return remove_unquoted_chars( s, []( char c ) -> bool { return (c == '"') or (c == '\''); } );
}

int main()
{
  std::string s;
  std::cout << "s? ";
  std::getline( std::cin, s );
  std::cout << remove_unquoted_whitespace( s ) << "\n";
}

Removes all characters identified by the given predicate except stuff inside a single-quoted or double-quoted C-style string, taking care to respect escaped characters.

Upvotes: 2

gsamaras
gsamaras

Reputation: 73394

No, there is not such a routine ready.

You may build your own though.

You have to loop over the string and you want to use a flag. If the flag is true, then you delete the spaces, if it is false, you ignore them. The flag is true when you are not in a part of quotes, else it's false.

Here is a naive, not widely tested example:

#include <string>
#include <iostream>
using namespace std;

int main() {
    // we will copy the result in new string for simplicity
    // of course you can do it inplace. This takes into account only
    // double quotes. Easy to extent do single ones though!
    string str("\"Hello, World!\" I am a string");
    string new_str = "";
    // flags for when to delete spaces or not
    // 'start' helps you find if you are in an area of double quotes
    // If you are, then don't delete the spaces, otherwise, do delete
    bool delete_spaces = true, start = false;
    for(unsigned int i = 0; i < str.size(); ++i) {
        if(str[i] == '\"') {
            start ? start = false : start = true;
            if(start) {
                delete_spaces = false;
            }
        }
        if(!start) {
            delete_spaces = true;
        }
        if(delete_spaces) {
            if(str[i] != ' ') {
                new_str += str[i];
            }
        } else {
            new_str += str[i];
        }

    }
    cout << "new_str=|" << new_str << "|\n";
    return 0;
}

Output:

new_str=|"Hello, World!"Iamastring|

Upvotes: 4

Related Questions