user297850
user297850

Reputation: 8015

string analysis

IF a string may include several un-necessary elements, e.g., such as @, #, $,%.

How to find them and delete them?

I know this requires a loop iteration, but I do not know how to represent sth such as @, #, $,%.

If you can give me a code example, then I will be really appreciated.

Upvotes: 6

Views: 5091

Answers (10)

Jerry Coffin
Jerry Coffin

Reputation: 490358

I think for this I'd use std::remove_copy_if:

#include <string>
#include <algorithm>
#include <iostream>

struct bad_char { 
    bool operator()(char ch) { 
        return ch == '@' || ch == '#' || ch == '$' || ch == '%';
    }
};

int main() { 
    std::string in("This@is#a$string%with@extra#stuff$to%ignore");
    std::string out;
    std::remove_copy_if(in.begin(), in.end(), std::back_inserter(out), bad_char());
    std::cout << out << "\n";
    return 0;
}

Result:

Thisisastringwithextrastufftoignore

Since the data containing these unwanted characters will normally come from a file of some sort, it's also worth considering getting rid of them as you read the data from the file instead of reading the unwanted data into a string, and then filtering it out. To do this, you could create a facet that classifies the unwanted characters as white space:

struct filter: std::ctype<char> 
{
    filter(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table()
    {
        static std::vector<std::ctype_base::mask> 
            rc(std::ctype<char>::table_size,std::ctype_base::mask());

        rc['@'] = std::ctype_base::space;
        rc['#'] = std::ctype_base::space;
        rc['$'] = std::ctype_base::space;
        rc['%'] = std::ctype_base::space;
        return &rc[0];
    }
};

To use this, you imbue the input stream with a locale using this facet, and then read normally. For the moment I'll use an istringstream, though you'd normally use something like an istream or ifstream:

int main() { 
    std::istringstream in("This@is#a$string%with@extra#stuff$to%ignore");
    in.imbue(std::locale(std::locale(), new filter));

    std::copy(std::istream_iterator<char>(in), 
        std::istream_iterator<char>(), 
        std::ostream_iterator<char>(std::cout));

    return 0;
}

Upvotes: 3

Cubbi
Cubbi

Reputation: 47448

The usual standard C++ approach would be the erase/remove idiom:

#include <string>
#include <algorithm>
#include <iostream>
struct OneOf {
        std::string chars;
        OneOf(const std::string& s) : chars(s) {}
        bool operator()(char c) const {
                return chars.find_first_of(c) != std::string::npos;
        }
};
int main()
{
    std::string s = "string with @, #, $, %";
    s.erase(remove_if(s.begin(), s.end(), OneOf("@#$%")), s.end());
    std::cout << s << '\n';
}

and yes, boost offers some neat ways to write it shorter, for example using boost::erase_all_regex

#include <string>
#include <iostream>
#include <boost/algorithm/string/regex.hpp>
int main()
{
    std::string s = "string with @, #, $, %";
    erase_all_regex(s, boost::regex("[@#$%]"));
    std::cout << s << '\n';
}

Upvotes: 13

fingerprint211b
fingerprint211b

Reputation: 1186

Something like this would do :

bool is_bad(char c)
{
  if( c == '@' || c == '#' || c == '$' || c == '%' )
    return true;
  else
    return false;
}

int main(int argc, char **argv)
{
  string str = "a #test #@string";
  str.erase(std::remove_if(str.begin(), str.end(), is_bad), str.end() );
}

If your compiler supports lambdas (or if you can use boost), it can be made even shorter. Example using boost::lambda :

  string str = "a #test #@string";
  str.erase(std::remove_if(str.begin(), str.end(), (_1 == '@' || _1 == '#' || _1 == '$' || _1 == '%')), str.end() );

(yay two lines!)

Upvotes: 1

Mark B
Mark B

Reputation: 96281

You can use a loop and call find_last_of (http://www.cplusplus.com/reference/string/string/find_last_of/) repeatedly to find the last character that you want to replace, replace it with blank, and then continue working backwards in the string.

Upvotes: 1

Necrolis
Necrolis

Reputation: 26171

use the characterizer operator, ie a would be 'a'. you haven't said whether your using C++ strings(in which case you can use the find and replace methods) or C strings in which case you'd use something like this(this is by no means the best way, but its a simple way):

void RemoveChar(char* szString, char c)
{
    while(*szString != '\0')
    {
        if(*szString == c)
            memcpy(szString,szString+1,strlen(szString+1)+1);

        szString++;
    }
}

Upvotes: 1

Amardeep AC9MF
Amardeep AC9MF

Reputation: 19054

General algorithm:

  1. Build a string that contains the characters you want purged: "@#$%"
  2. Iterate character by character over the subject string.
  3. Search if each character is found in the purge set.
  4. If a character matches, discard it.
  5. If a character doesn't match, append it to a result string.

Depending on the string library you are using, there are functions/methods that implement one or more of the above steps, such as strchr() or find() to determine if a character is in a string.

Upvotes: 2

Jakob
Jakob

Reputation: 24370

And if you, for some reason, have to do it yourself C-style, something like this would work:

char* oldstr = ... something something dark side ...

int oldstrlen = strlen(oldstr)+1;
char* newstr = new char[oldstrlen]; // allocate memory for the new nicer string
char* p = newstr; // get a pointer to the beginning of the new string

for ( int i=0; i<oldstrlen; i++ ) // iterate over the original string
    if (oldstr[i] != '@' && oldstr[i] != '#' && etc....) // check that the current character is not a bad one
      *p++ = oldstr[i]; // append it to the new string
*p = 0; // dont forget the null-termination

Upvotes: 3

Vicky
Vicky

Reputation: 13244

Is this C or C++? (You've tagged it both ways.)

In pure C, you pretty much have to loop through character by character and delete the unwanted ones. For example:

char *buf; 
int len = strlen(buf);
int i, j;

for (i = 0; i < len; i++)
{
    if (buf[i] == '@' || buf[i] == '#' || buf[i] == '$' /* etc */)
    {
        for (j = i; j < len; j++)
        { 
            buf[j] = buf[j+1];
        }
        i --;
    }
}

This isn't very efficient - it checks each character in turn and shuffles them all up if there's one you don't want. You have to decrement the index afterwards to make sure you check the new next character.

Upvotes: 2

Android Eve
Android Eve

Reputation: 14974

A character is represented in C/C++ by single quotes, e.g. '@', '#', etc. (except for a few that need to be escaped).

To search for a character in a string, use strchr(). Here is a link to a sample code:

http://www.cplusplus.com/reference/clibrary/cstring/strchr/

Upvotes: -1

user195488
user195488

Reputation:

If you want to get fancy, there is Boost.Regex otherwise you can use the STL replace function in combination with the strchr function..

Upvotes: 3

Related Questions