neo
neo

Reputation: 1009

Parsing Regex in C

What should be the regular expression for below format using regex.

000000|000110|12|11|alphanumeric value

6digits|6 digits|2 digits|2 digits|alphanumeric value including space

I tried below code with (^(\\d{6})|(\\d{6})|(\\d{2})|(\\d{2})|(([a-zA-Z0-9 ])*)$) regex but it doesn't seem to work as expected :

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <regex.h>

int main()
{

   int res;

   char err_buf[BUFSIZ];
   char src[] = "000000|000110|12|11|alphanumeric value";  

   const char* pattern = "(^(\\d{6})|(\\d{6})|(\\d{2})|(\\d{2})|(([a-zA-Z0-9 ])*)$)";
   regex_t preg;

   regmatch_t pmatch[100];

   if( (res = regcomp(&preg, pattern, REG_EXTENDED)) != 0)
   {
      regerror(res, &preg, err_buf, BUFSIZ);
      printf("regcomp: %s\n", err_buf);
      exit(res);
   }
 //   res=regcomp(&preg, src,REG_EXTENDED);
   res = regexec(&preg, src, 100, pmatch, REG_NOTBOL);
   //~ res = regexec(&preg, src, 10, pmatch, 0);
   //~ res = regexec(&preg, src, 10, pmatch, REG_NOTEOL);

   if(res == 0)
   {
   printf("Match Found\n");
   }
    else if(res == REG_NOMATCH ){
      printf("NO match\n");
      exit(0);
   }
   regfree(&preg);
   return 0;
}

Thanks in advance.

Upvotes: 4

Views: 4202

Answers (1)

Thomas Ayoub
Thomas Ayoub

Reputation: 29431

Since pipes are metacharacters and you want to match literals |, you need to escape your them, but if you only \| them it will escape its for C++ thus the error you get. Use \\| like you did with \\d to get a literal \d in your string.

Thus your regex will be ^(\\d{6})\\|(\\d{6})\\|(\\d{2})\\|(\\d{2})\\|([a-zA-Z0-9 ]*)$ (I took the liberty to rephrase the last group).

As Jonathan noticed, you're using POSIX regex which doesn't support \d. You can use [0-9] if you only want to match ASCII digit or [:digit:] if you want to match wider charset. Hence:

^([0-9]{6})\\|([0-9]{6})\\|([0-9]{2})\\|([0-9]{2})\\|([a-zA-Z0-9 ]*)$

Upvotes: 2

Related Questions