Noir
Noir

Reputation: 484

Not matches for working regex in c

I want to match the regex (?<=SEARCH_THIS=").+(?<!"\n) in C with PCRE.

However, the following code doesn't work as expected.

#include <pcreposix.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>


int main(void){
    regex_t re;
    regmatch_t matches[2];
    char *regex = "(?<=SEARCH_THIS=\").+(?<!\"\n)";
    char *file = "NO_MATCH=\"0\"\nSOMETHING_ELSE=\"1\"\nSOME_STUFF=\"1\"\nSEARCH_THIS=\"gimme that\"\nNOT_THIS=\"foobar\"\nTHIS_NEITHER=\"test\"\n";

    puts("compiling regex");
    int compErr = regcomp(&re, regex, REG_NOSUB | REG_EXTENDED);
    if(compErr != 0){
        char buffer[128];
        regerror(compErr, &re, buffer, 100);
        printf("regcomp failed: %s\n", buffer);
        return 0;
    }
    puts("executing regex");
    int err = regexec(&re, file, 2, matches, 0);
    if(err == 0){
        puts("no error");
        printf("heres the match: [.%*s]",matches[0].rm_eo-matches[0].rm_so,file+matches[0].rm_so);
    } else {
        puts("some error here!");
        char buffer[128];
        regerror(err, &re, buffer, 100);
        printf("regexec failed: %s\n", buffer);
    }
    return 0;
}

The console output is:

compiling regex
executing regex
some error here!
regexec failed: No match

I verified the functionality of this regex here Any idea what is going wrong here?

EDIT #1

Compiler Version

 $ arm-merlin-linux-uclibc-gcc --version
 arm-merlin-linux-uclibc-gcc (GCC) 4.2.1
 Copyright (C) 2007 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Compile Command

 $ arm-merlin-linux-uclibc-gcc -lpcre ./re_test.c -o re_test.o

Upvotes: 3

Views: 453

Answers (1)

Drew McGowen
Drew McGowen

Reputation: 11716

There are actually a few issues with your code.

First, you use %*s in an attempt to restrict the length of the printed string. However, the integer width before the s formatter is the minimum length of what gets printed; if the corresponding string's length is less than what's given, it'll be padded with spaces. If the length is greater than what's given, it'll just output the whole string. You'll need some other method of restricting the length of the outputted string (just avoid modifying *file, because file points to a constant string).

Second, you specify the REG_NOSUB option in your regcomp call, but according to the man page, this means that no substring positions are stored in the pmatch argument - thus, even if your regexec did work, the following printf would be using uninitialized values (which is undefined behavior).

Finally, I suspect the problem is that the \" and \n characters need to be doubly-escaped; i.e. you need to use \\\" and \\n in your regex string. While the code you gave worked for me (Ubuntu 14.04 x64), the doubly-escaped version also works.

Taking all of this into account, this is the output I get:

compiling regex
executing regex
no error
heres the match: [.gimme that"]

Upvotes: 1

Related Questions