Forever Learner
Forever Learner

Reputation: 1483

Regex is not working in C

I am using regex when I use it on shell it works but not inside the C program.

Any thoughts please?

echo "abc:[email protected]" | grep -E "(\babc\b|\bdef\b):[0-9]{10}@([A-Za-z0-9].*)"   //shell

reti = regcomp(&regex,"(\babc\b|\bdef\b):[0-9]{10}@([A-Za-z0-9].*)", 0); //c program

Upvotes: 7

Views: 1273

Answers (2)

user557597
user557597

Reputation:

Word Boundary Reference

General
POSIX

From the above links it appears POSIX supports it's own word boundary construct.
Note that these constructs [[:<:]], [[:>:]] are not classes.

Given that, and using ERE as opposed to BRE, you should be able to do this -

reti = regcomp(&regex,"[[:<:]](abc|def)[[:>:]]:[0-9]{10}@([A-Za-z0-9].*)", REG_EXTENDED);

or, since between [cf] and : is a natural word boundary, it can be reduced to

reti = regcomp(&regex,"[[:<:]](abc|def):[0-9]{10}@([A-Za-z0-9].*)", REG_EXTENDED);

I haven't tested this but it probably works.
And given it's actually unclear as to what this does internally, it might be better to
stick with this syntax.

Some engines, like Boost which have the POSIX option, customize the syntax to \< and \>

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626758

grep -E uses some enhanced ERE syntax meaning that the {n,m} quantifier braces (and also ( and )) do not have to be escaped (not the case in BRE regex).

You need to pass REG_EXTENDED flag to the regcomp, and also, since you can't use a word boundary, replace the first \b with (^|[^[:alnum:]_]) "equivalent". You need no trailing \b since there is a : in the pattern right after:

const char *str_regex = "(^|[^[:alnum:]_])(abc|def):[0-9]{10}@([A-Za-z0-9].*)";

The (^|[^[:alnum:]_]) part matches either the start of the string (^) or (|) a char other than alphanumeric or an underscore.

Full C demo:

#include <stdio.h>
#include <stdlib.h>
#include <regex.h>

int main (void)
{
  int match;
  int err;
  regex_t preg;
  regmatch_t pmatch[4];
  size_t nmatch = 4;
  const char *str_request = "abc:[email protected]";

  const char *str_regex = "(^|[^[:alnum:]_])(abc|def):[0-9]{10}@([A-Za-z0-9].*)";
  err = regcomp(&preg, str_regex, REG_EXTENDED);
  if (err == 0)
    {
      match = regexec(&preg, str_request, nmatch, pmatch, 0);
      nmatch = preg.re_nsub;
      regfree(&preg);
      if (match == 0)
        {
          printf("\"%.*s\"\n", pmatch[2].rm_eo - pmatch[2].rm_so, &str_request[pmatch[2].rm_so]);
          printf("\"%.*s\"\n", pmatch[3].rm_eo - pmatch[3].rm_so, &str_request[pmatch[3].rm_so]);
        }
      else if (match == REG_NOMATCH)
        {
          printf("unmatch\n");
        }
    }
  return 0;
 }

Upvotes: 4

Related Questions