user3111311
user3111311

Reputation: 8001

How to extract parts from regex in C++?

How to extract parts from regex in C++?

For example I have patterns like this:

new line means "followed by"

delimiter string,
name,
':' character,
list of Xs, where X is name; (string followed by ';' character)

I can use regex for matching, but is there a way to not only match, but also extract parts from the pattern? For example:

$DatasetName: A; B; C;

is a given string, and I would like to extract the dataset name, and then the column names A, B, and C.

Upvotes: 0

Views: 131

Answers (1)

Rudolfs Bundulis
Rudolfs Bundulis

Reputation: 11944

Well, as already suggested you could do by hand parsing similar to this (it is only for demonstration purposes and does not claim to be perfect):

#include <iostream>
#include <vector>
#include <string>

bool parse_by_hand(const std::string& phrase)
{
    enum parse_state
    {
        parse_name,
        parse_value,
    };
    std::string name, current_value;
    std::vector<std::string> values;
    parse_state state = parse_name;
    for(std::string::const_iterator iterator = phrase.begin(); iterator != phrase.end(); iterator++)
    {
        switch(state)
        {
        case parse_name:
            if(*iterator != ':')
                name += *iterator;
            else 
                state = parse_value;
            break;
        case parse_value:
            if(*iterator != ';')
                current_value += *iterator;
            else 
            {
                state = parse_value;
                values.push_back(current_value);
                current_value.clear();
            }
            break;
        default:
            return false;
        }
    }
    // Error checking here, name parsed? values parsed?
    return true;
}

int main(int argc, char** argv)
{
    std::string phrase("$DatasetName: A; B; C;");
    parse_by_hand(phrase);
}

As for the std::regex, my first shot was for something like this ([^:]*):(([^;]*);)* but unless I'm not mistaken (and I hope someone corrects me if I am), the recursive capture group will give you the last matched value not all values so you would still have to do multiple iterations with regex_search which takes away the ease of 'one-liner-regex-matching' off the table. Alternatively if std::regex is not a must and you can use Boost, take a look at Repeated captures, this should solve the capture group issue.

Upvotes: 1

Related Questions