Aniekan Umoren
Aniekan Umoren

Reputation: 33

Unwanted regex capturing

The following regex expression is supposed to match a date in the form of YYYY-MM-DD sandwiched between two non alpha-numeric characters. It's supposed to extract only the date and not the two non-alphanum chars...but it does the opposite. What am I doing wrong. PS i already tried surrounding the [^:alnum:] in a non-capturing group (?:) but it didn't work.

regex exp1("[^:alnum:]([1-9][0-9]{3}(?:-[0-9][1-9]){2})[^:alnum:]")
//or
regex exp1("[^a-zA-Z0-9]([1-9][0-9]{3}(?:-[0-9][1-9]){2})[^a-zA-Z0-9]")

you can also go to this website to try my regex without having to write out c+ code for it. copy&paste the non POSIX bracket expression (without the quotations) if you choose to utilize the site:

regex online tester

#include <regex>
#include <string>
#include <iostream>
#include <vector>

#define isthirty(x) for (int i = 0; i < 3; i++) {if (days[i] == x[1]) {thirty = true;break;}}
using namespace std;

int main() {
    vector<string> words;
    string str;
    getline(cin, str);
    int N = stoi(str);
    int days[] = { 4,6,9,11 };
    regex exp1("[^a-zA-Z0-9]([1-9][0-9]{3}(?:-[0-9][1-9]){2})[^a-zA-Z0-9]");
    for (int i = 0; i < N; i++) {
        getline(cin, str);
        sregex_iterator it(str.cbegin(), str.cend(), exp1);
        sregex_iterator end;
        for (; it != end; it++) {
            words.push_back(it->str(0));
        }
    }

    regex exp2("([0-9])+");
    for (auto &it : words) {
        int dates[3] = {};
        sregex_iterator pos(it.cbegin(), it.cend(), exp2);
        sregex_iterator end;
        str = it.substr(1,10);
        for (int i = 0; pos != end; pos++, i++) {
            dates[i] = stoi(pos->str(0));
        }
        if (dates[0] > 2016 || dates[1] > 12 || dates[2] > 31) {
            continue;
        }
        bool thirty = false;
        isthirty(dates);
        if (thirty && dates[2] <= 30) {
            cout << str << "\n";
        }
        else if(dates[1] == 2) {
            if (dates[0] % 4 == 0 && dates[2] <= 29) {
                cout << str << "\n";
            }
            else if (dates[0] % 4 != 0 && dates[2] <= 28) {
                cout << str << "\n";
            }
        }
        else if (dates[2] <= 31) {
            cout << str << "\n";
        }
    }
    return 0;
}

Upvotes: 0

Views: 62

Answers (2)

user6067118
user6067118

Reputation:

Try simplier regexp:

[^0-9]([0-9]{4}-[0-9]{2}-[0-9]{2})[^0-9]

It looks for a non-digit, then the YYYY-MM-DD date, then a non-digit. It captures the date. Works for almost all regexp flavours.

Upvotes: 1

Steven
Steven

Reputation: 754

In the regex you've provided, the overall regex (a.k.a. group 0) will include the two non-alphanum characters, but capture group 1 should only contain the date you're interested in. So, you could just use your regex as-is and then extract the information from group 1.

If you actually want to change your regex to not include the non-alphanum characters, you need to look into using a "positive lookbehind assertion" for the first group and a "positive lookahead assertion" for the last group. The assertions, even though they kind of look like other groups, don't actually include what they matched in the result.

Upvotes: 0

Related Questions