TN888
TN888

Reputation: 7739

Remove repeating characters from string

I have a string, like e.g. acaddef or bbaaddgg. I have to remove from it, as fast as possible, all repeating characters. So, for example, pooaatat after should look like poat and ggaatpop should look like gatpo. Is there any built-in function or algorithm to do that quickly? I tried to search STL, but without satisfaing result.

Upvotes: 3

Views: 3469

Answers (3)

user123
user123

Reputation: 9071

Okay, so here are 4 different solutions.

Fixed Array

std::string str = "pooaatat";

// Prints "poat"
short count[256] = {0};
std::copy_if(str.begin(), str.end(), std::ostream_iterator<char>(std::cout),
             [&](unsigned char c) { return count[c]++ == 0; });

Count Algorithm + Iterator

std::string str = "pooaatat";

// Prints "poat"
std::string::iterator iter = str.begin();
std::copy_if(str.begin(), str.end(), std::ostream_iterator<char>(std::cout),
             [&](char c) { return !std::count(str.begin(), iter++, c); });

Unordered Set

std::string str = "pooaatat";

// Prints "poat"
std::unordered_set<char> container;
std::copy_if(str.begin(), str.end(), std::ostream_iterator<char>(std::cout),
             [&](char c) { return container.insert(c).second; });

Unordered Map

std::string str = "pooaatat";

// Prints "poat"
std::unordered_map<char, int> container;
std::copy_if(str.begin(), str.end(), std::ostream_iterator<char>(std::cout),
             [&](char c) { return container[c]++ == 0; });

Upvotes: 3

Sceptical Jule
Sceptical Jule

Reputation: 917

Built-in regular expressions should be efficient, i.e.

#include <regex>
[...]

const std::regex pattern("([\\w ])(?!\\1)");
string s = "ssha3akjssss42jj 234444 203488842882387 heeelloooo";
std::string result;

for (std::sregex_iterator i(s.begin(), s.end(), pattern), end; i != end; ++i)
    result.append((*i)[1]);

std::cout << result << std::endl;

Of course, you can modify the cpaturing group to your needs. The good thing is that it is supported in Visual Studio 2010 tr1 already. gcc 4.8, however, seems to have a problem with regex iterators.

Upvotes: 0

Abhishek Bansal
Abhishek Bansal

Reputation: 12715

AFAIK, there is no built-in algorithm for doing this. The std::unique algorithm is valid if you want to remove only consecutive duplicate characters.

However you can follow the following simple approach:

If the string contains only ASCII characters, you can form a boolean array A[256] denoting whether the respective character has been encountered already or not.

Then simply traverse the input string and copy the character to output if A[character] is still 0 (and make A[character] = 1).

In case the string contains arbitrary characters, then you can use a std::unordered_map or a std::map of char to int.

Upvotes: 3

Related Questions