ElevenCents
ElevenCents

Reputation: 73

C++ regex capture is dropping last char in email validator

C++ Shell Online Execution Link: http://cpp.sh/5z2uq

I am writing a regex to validate an email ID which can have multiple dots and plus characters in its local name and can only have one dot in the domain name.

The problem I'm facing now is in capture group. My domain name capture, i.e. group #2 is working as expected, as seen in the output. But, when I try to capture local name i.e. group #1,

it is not supposed to capture anything past the '+' sign not including '+', and after capturing local name, output has a missing last character.

Please take a look at my C++ regex code:

#include <iostream>
#include <regex>
using namespace std;
int main()
{
    string str;
    vector<string> emails = {
            "[email protected]",
            "[email protected]",
            "[email protected]",
            "[email protected]",
            "[email protected]"
        };

    for(auto ele : emails)
    {
        str = ele;
        
        regex e("([\\w+\\.]+)\\+*[\\+\\w]+\\@([\\w]+\\.[\\w]+)$");
        smatch parts;
        bool match = regex_match(str,parts,e);
        
        if(match==true)
        {
            cout << "Local  : " << parts.str(1) << endl;
            cout << "Domain : " << parts.str(2) << endl;
            cout << "Valid Email ID: " << ele << endl << endl;
        }
        else
        {
            cout << "Invalid Email ID: " << ele << endl << endl;
        }
    }

    return 0;
}

Output:

Local : loca
Domain : domain.com
Valid Email ID: [email protected]

Local : local.constan
Domain : domain.com
Valid Email ID: [email protected]

Local : local+addo
Domain : domain.com
Valid Email ID: [email protected]

Local : local.constant+addo
Domain : domain.com
Valid Email ID: [email protected]

Invalid Email ID: [email protected]

Notice how, in the local variable, my regex group capture is dropping the last character.

Questions:

  1. How do I group capture till the '+' sign
  2. How do I make the group capture not drop the last character?

Upvotes: 1

Views: 76

Answers (1)

mcernak
mcernak

Reputation: 9130

You can use this expression:

"([\\w.]+)(?:\\+[\\w]+)*\\@([\\w]+\\.[\\w]+)$"

The first part ([\\w.]+) matches the Local part (i.e. any word character or dot)
The second part (?:\\+[\\w]+)* denotes a non-capturing group repeated 0 or more times (matching a plus sign folowed by one or more word characters).
The third part \\@ matches the @ character.
The last part ([\\w]+\\.[\\w]+) matches the Domain part (i.e. two words separated with one dot), which you got right.

Upvotes: 1

Related Questions