Reputation: 83
I do not understand why the regex pattern containing the \d
character class does not work but [0-9]
does. Character classes, such as \s
(whitespace characters) and \w
(word characters), do work. My compiler is gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3. I am using the C regular expression library.
Why doesn't \d
work?
Text string:
const char *text = "148 apples 5 oranges";
For the above text string, this regex does not match:
const char *rstr = "^\\d+\\s+\\w+\\s+\\d+\\s+\\w+$";
This regex matches when using [0-9] instead of \d:
const char *rstr = "^[0-9]+\\s+\\w+\\s+[0-9]+\\s+\\w+$";
#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
#define N_MATCHES 30
// output from gcc --version: gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
// compile command used: gcc -o tstc_regex tstc_regex.c
const char *text = "148 apples 5 oranges";
const char *rstr = "^[0-9]+\\s+\\w+\\s+[0-9]+\\s+\\w+$"; // finds match
//const char *rstr = "^\\d+\\s+\\w+\\s+\\d+\\s+\\w+$"; // does not find match
int main(int argc, char**argv)
{
regex_t rgx;
regmatch_t matches[N_MATCHES];
int status;
status = regcomp(&rgx, rstr, REG_EXTENDED | REG_NEWLINE);
if (status != 0) {
fprintf(stdout, "regcomp error: %d\n", status);
return 1;
}
status = regexec(&rgx, text, N_MATCHES, matches, 0);
if (status == REG_NOMATCH) {
fprintf(stdout, "regexec result: REG_NOMATCH (%d)\n", status);
}
else if (status != 0) {
fprintf(stdout, "regexec error: %d\n", status);
return 1;
}
else {
fprintf(stdout, "regexec match found: %d\n", status);
}
return 0;
}
Upvotes: 8
Views: 3706
Reputation: 75242
The regex flavor you're using is GNU ERE, which is similar to POSIX ERE, but with a few extra features. Among these are support for the character class shorthands \s
, \S
, \w
and \W
, but not \d
and \D
. You can find more info here.
Upvotes: 9
Reputation: 47264
Trying either pattern in a strictly POSIX environment will likely end up having no matches; if you want to make the pattern truly POSIX compatible use all bracket expressions:
const char *rstr = "^[[:digit:]]+[[:space:]]+[[:alpha:]]+[[:space:]]+[[:digit:]]+[[:space:]]+[[:alpha:]]+$";
Upvotes: 5
Reputation: 5220
\d is a perl and vim character class.
Use instead:
const char *rstr = "^[[:digit:]]+\\s+\\w+\\s+[[:digit:]]+\\s+\\w+$";
Upvotes: 1
Reputation: 126418
According to the POSIX regular expression spec:
An ordinary character is any character in the supported character set, except for the ERE special characters listed in ERE Special Characters. The interpretation of an ordinary character preceded by a backslash ( '\' ) is undefined.
So the only characters that can legally follow a \
are:
\^ \. \[ \$ \( \) \|
\* \+ \? \{ \\
all of which match the escaped character literally. Trying to use any of of the other PCRE extensions may not work.
Upvotes: 1