pcd6623
pcd6623

Reputation: 746

Why doesn't Python "grouping" work for regular expressions in C?

Here is my Python program:

import re

print re.findall( "([se]{2,30})ting", "testingtested" )

Its output is:

['es']

Which is what I expect. I expect to get back "es" because I searched for 2-30 characters of "e" or "s" which are followed by "ting".

Here is my C program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <regex.h>

int main(void) {

    regex_t preg;
    regmatch_t pmatch;

    char string[] = "testingtested";

    //Compile the regular expression
    if ( regcomp( &preg, "([se]{2,30})ting", REG_EXTENDED ) ) {
        printf( "ERROR!\n" );
        return -1;
    } else {
        printf( "Compiled\n" );
    }

    //Do the search
    if ( regexec( &preg, string, 1, &pmatch, REG_NOTEOL ) ) {
        printf( "No Match\n" );
    } else {

        //Allocate memory on the stack for this
        char substring[pmatch.rm_eo - pmatch.rm_so + 1];

        //Copy the substring over
        printf( "%d %d\n", pmatch.rm_so, pmatch.rm_eo );
        strncpy( substring, &string[pmatch.rm_so], pmatch.rm_eo - pmatch.rm_so );

        //Make sure there's a null byte
        substring[pmatch.rm_eo - pmatch.rm_so] = 0;

        //Print it out
        printf( "Match\n" );
        printf( "\"%s\"\n", substring );
    }

    //Release the regular expression
    regfree( &preg );

    return EXIT_SUCCESS;
}

It's output is:

Compiled
1 7
Match
"esting"

Why is the C program including the "ting" in the result? And is there a way for me to exclude the "ting" portion?

Upvotes: 1

Views: 173

Answers (2)

R.. GitHub STOP HELPING ICE
R.. GitHub STOP HELPING ICE

Reputation: 215277

pmatch is the whole match, not the first parenthesized subexpression.

Try changing pmatch to an array of 2 elements, then passing 2 in place of 1 to regexec and using the [1] element to get the subexpression match.

To others who have cited differences between C and Python and different types of regular expressions, that's all unrelated. This expression is very simple and that's not coming into play.

Upvotes: 3

Kos
Kos

Reputation: 72271

While regular expressions are "more or less the same everywhere", the exact supported features differ from implementation to implementation.

Unfortunately, you need to consult each regex library's documentation separately when designing your regular expressions.

Upvotes: 2

Related Questions