Matt Parkins
Matt Parkins

Reputation: 24688

Pulling out data from a line of text in C++ using regex

I have a text file in the format:

number tab word tab word tab junk
number tab word tab word tab junk
number tab word tab word tab junk
number tab word tab word tab junk
number tab word tab word tab junk

For each line I'd like to put the number in a uint32_t, then the two words into strings and then ignore the rest of the line. I could do this by loading the file into memory and then working through it a byte at a time, but I'm convinced that a lovely regex could do it for me. Any ideas?

I'm working in C++ using #include in Xcode - this is a commandline tool so there's no real output, I'm just storing the data to compare with other data.

Upvotes: 0

Views: 177

Answers (2)

zx81
zx81

Reputation: 41838

Matt, you can use this simple regex:

(?im)^(\d+)\t([a-z]+)\t([a-z]+)

It captures the number in Group 1, the first word in Group 2, and the second word in Group 3.

To retrieve them from Groups 1, 2 and 3, I am not sure of your the exact C++ syntax, but this code stub give one idea of how to iterate over the matches and groups. Note that in this case we don't care about the overall matches, just the capturing groups.

try {
    TRegEx RegEx("(?im)^(\\d+)\t([a-z]+)\t([a-z]+)", TRegExOptions() << roIgnoreCase << roMultiLine);
    TMatch Match = RegEx.Match(SubjectString);
    while (Match.Success) {
        for (int i = 1; i < Match.Groups.Count; i++) {
            TGroup Group = Match.Groups[i];
            if (Group.Success) {
                // matched text: Group.Value
                // match start: Group.Index
                // match length: Group.Length
            } 
        }
        Match = Match.NextMatch();
    } 
} catch (ERegularExpressionError *ex) {
    // Syntax error in the regular expression
}

Upvotes: 1

Michael J
Michael J

Reputation: 7939

extern bool DoStuff(unsigned n, 
                    const std::string &s0, 
                    const std::string &s1);

bool ProcessFile(const std::string &sFileName)
{
    std::ifstream ifs(sFileName);
    if (!ifs)
        return false;

    while (ifs)
    {
        unsigned n;
        std::string s0, s1;
        ifs >> n >> s0 >> s1;
        if (ifs.bad() || !DoStuff(n, s0, s1))
            return false;
        ifs.ignore(std::numeric_limits<int>::max(), '\n');
    }
    return !ifs.bad();
}

Upvotes: 1

Related Questions