rraallvv
rraallvv

Reputation: 2933

C++ split string using a list of words as separators

I would like to split a string like this one

“this1245is@g$0,therhsuidthing345”

using a list of words like the one bellow

{“this”, “is”, “the”, “thing”}

into this list

{“this”, “1245”, “is”, “@g$0,”, “the”,  “rhsuid”, “thing”, “345”}
// ^--------------^---------------^------------------^-- these were the delimiters

The delimiters are allowed to appear more than once in the string to split, and it can be done using regular expressions

The precedence is in the order in which the delimiters appear in the array

The platform I'm developing for has no support for the Boost library

Update

This is what I have for the moment

#include <iostream>
#include <string>
#include <regex>

int main ()
{
    std::string s ("this1245is@g$0,therhsuidthing345");
    std::string delimiters[] = {"this", "is", "the", "thing"};

    for (int i=0; i<4; i++) {
        std::string delimiter =  "(" + delimiters[i] + ")(.*)";
        std::regex e (delimiter);   // matches words beginning by the i-th delimiter

        // default constructor = end-of-sequence:
        std::sregex_token_iterator rend;

        std::cout << "1st and 2nd submatches:";
        int submatches[] = { 1, 2 };
        std::sregex_token_iterator c ( s.begin(), s.end(), e, submatches );
        while (c!=rend) std::cout << " [" << *c++ << "]";
        std::cout << std::endl;
    }

    return 0;
}

output:

1st and 2nd submatches:[this][x1245fisA@g$0,therhsuidthing345]
1st and 2nd submatches:[is][x1245fisA@g$0,therhsuidthing345]
1st and 2nd submatches:[the][rhsuidthing345]
1st and 2nd submatches:[thing][345]

I think I need to make some recursive thing to call on each iteration

Upvotes: 1

Views: 1402

Answers (3)

Snowhawk
Snowhawk

Reputation: 672

Build the expression you want for matches only (re), then pass in {-1, 0} to your std::sregex_token_iterator to return all non-matches (-1) and matches (0).

#include <iostream>
#include <regex>

int main() {
   std::string s("this1245is@g$0,therhsuidthing345");
   std::regex re("(this|is|the|thing)");

   std::sregex_token_iterator iter(s.begin(), s.end(), re, { -1, 0 });
   std::sregex_token_iterator end;

   while (iter != end) {
      //Works in vc13, clang requires you increment separately,
      //haven't gone into implementation to see if/how ssub_match is affected.
      //Workaround: increment separately.
      //std::cout << "[" << *iter++ << "] ";
        std::cout << "[" << *iter << "] ";
        ++iter;
   }
}

Upvotes: 3

jxh
jxh

Reputation: 70382

I don't know how to perform the precedence requirement. This seems to work on the given input:

std::vector<std::string> parse (std::string s)
{
    std::vector<std::string> out;

    std::regex re("\(this|is|the|thing).*");
    std::string word;

    auto i = s.begin();
    while (i != s.end()) {
        std::match_results<std::string::iterator> m;
        if (std::regex_match(i, s.end(), m, re)) {
            if (!word.empty()) {
                out.push_back(word);
                word.clear();
            }
            out.push_back(std::string(m[1].first, m[1].second));
            i += out.back().size();
        } else {
            word += *i++;
        }
    }
    if (!word.empty()) {
        out.push_back(word);
    }

    return out;
}

Upvotes: 2

tohava
tohava

Reputation: 5412

vector<string> strs; 
boost::split(strs,line,boost::is_space());

Upvotes: 1

Related Questions