Nicolas Charvoz
Nicolas Charvoz

Reputation: 1509

Trim / Remove useless whitespace and tab from a string

Can anyone suggest a way of stripping tab characters ( "\t"s ) from a string? (std::string)

I know that I can do a lot with :

str.erase (std::remove (str.begin(), str.end(), ' '), str.end());

But it takes off all the whitespaces.

For example I want this :

push int32(45) or __WT__ push int32(45) __WT__

To become this :

push int32(45)

A string with only one whitespace between keywords.

Thanks in anticipation.

Upvotes: 0

Views: 2995

Answers (4)

JuniorCompressor
JuniorCompressor

Reputation: 20025

You can create a template trim function implemented in a similar way with remove_if

#include <string>
#include <iterator>
#include <iostream>
#include <ctype.h>
#include <sstream>
using namespace std;

template <class ForwardIterator, class OutputIterator, class UnaryPredicate>
void trim (
  ForwardIterator first, ForwardIterator last, OutputIterator result,
  UnaryPredicate pred
) {
  while (first != last && pred(*first))
    first++;
  for (ForwardIterator p = last; first != last; first++) {
    if (pred(*first))
      p = first;
    else {
      if (p != last) {
        *result = *p;
        p = last;
      }
      *result = *first; 
    }
  }
}

inline bool isJunk(char c) {
  return isspace(c);
}

inline string trim_string(string s) {
  ostringstream result;
  trim(s.begin(), s.end(), ostream_iterator<char>(result, ""), isJunk);
  return result.str();
}

int main() {
  cout << trim_string(" What     the    fraaak    ") << "." << endl;
}

Output:

What the fraaak.

Upvotes: 1

fredoverflow
fredoverflow

Reputation: 263390

I can only use C++98, regex are for C++11

Here is a super-efficient in-place solution that does not require any libraries and works in C++98:

template<typename FwdIter>
FwdIter replace_whitespace_by_one_space(FwdIter begin, FwdIter end)
{
    FwdIter dst = begin;
IGNORE_LEADING_WHITESPACE:
    if (begin == end) return dst;
    switch (*begin)
    {
    case ' ':
    case '\t':
        ++begin;
        goto IGNORE_LEADING_WHITESPACE;
    }
COPY_NON_WHITESPACE:
    if (begin == end) return dst;
    switch (*begin)
    {
    default:
        *dst++ = *begin++;
        goto COPY_NON_WHITESPACE;
    case ' ':
    case '\t':
        ++begin;
        // INTENTIONAL FALLTHROUGH
    }
LOOK_FOR_NEXT_NON_WHITESPACE:
    if (begin == end) return dst;
    switch (*begin)
    {
    case ' ':
    case '\t':
        ++begin;
        goto LOOK_FOR_NEXT_NON_WHITESPACE;
    default:
        *dst++ = ' ';
        *dst++ = *begin++;
        goto COPY_NON_WHITESPACE;
    }
}

Note that gotos are generally considered to be perfectly acceptable in generated code for finite automata, although in this case, I must admit the code was generated by my brain and fingers ;)

Here is an example of how you might use the proposed solution:

int main()
{
    std::string example = "\t\t\tpush \t \t42\t\t\t";
    auto new_end = replace_whitespace_by_one_space(example.begin(), example.end());
    example.erase(new_end, example.end());
    std::cout << "[" << example << "]\n";
}

Upvotes: 2

Axalo
Axalo

Reputation: 2953

For those who can't use C++11, here is a simple non-regex solution:

void RemoveWhitespace(std::string *str)
{
    // all tabs to spaces
    ReplaceString(str, "\t", " ");

    // all double spaces to single spaces
    while (ReplaceString(str, "  ", " ") != 0); 

    // trim the string
    if (!s.empty())
    {
        if (s.back() == ' ') s.pop_back();
        if (s.front() == ' ') s.erase(s.begin());
    }
}

Where ReplaceString may be implemented as

// returns the number of replaced substrings
unsigned int ReplaceString(std::string &str, const std::string &search,
                           const std::string &replace)
{
    unsigned int count = 0;

    size_t pos = 0;
    while ((pos = str.find(search, pos)) != std::string::npos)
    {
        str.replace(pos, search.length(), replace);
        pos += replace.length();
        ++count;
    }

    return count;
}

Upvotes: 0

eerorika
eerorika

Reputation: 238491

If you want to replace all consecutive whitespace with a single space, you can do that easily with a trivial regexp. If your compiler supports the current standard, it should have regexp utilities in the standard library, but if you're limited to c++98, you can use an external library instead. Here's a solution using one such library:

test = boost::regex_replace(test, boost::regex("\\s+"), " ");

Upvotes: 0

Related Questions