Reputation: 1239
I need to match the following line with multiple capturing groups:
0.625846 29Si 29 [4934.39 0] [0.84 100000000000000.0]
I use the regex:
^(0+\.[0-9]?e?[+-]?[0-9]+)\s+([0-9]+\.?[0-9]*|[0-9][0-9]?[0-9]?[A-Z][a-z]?)\s+([0-9][0-9]?[0-9]?)\s+(\[.*\])\s+(\[.*\])$
see this link for a regex101 workspace. However I find that when I'm trying the matching using regex.h
it behaves differently on OSX or linux, specifically:
Fails on: OSX: 10.14.6 LLVM: 10.0.1 (clang-1001.0.46.4)
Works on: linux: Ubuntu 18.04 g++: 7.5.0
I worked up a brief code the reproduces the problem, compiled with g++ regex.cpp -o regex
:
#include <iostream>
//regex
#include <regex.h>
using namespace std;
int main(int argc, char** argv) {
//define a buffer for keeping results of regex matching
char buffer[100];
//regex object to use
regex_t regex;
//*****regex match and input file line*******
string iline = "0.625846 29Si 29 [4934.39 0] [0.84 100000000000000.0]";
string matchfile="^(0+\\.[0-9]?e?[+-]?[0-9]+)\\s+([0-9]+\\.?[0-9]*|[0-9][0-9]?[0-9]?[A-Z][a-z]?)\\s+([0-9][0-9]?[0-9]?)\\s+(\\[.*\\])\\s+(\\[.*\\])$";
//compile the regex
int reti = regcomp(®ex,matchfile.c_str(),REG_EXTENDED);
regerror(reti, ®ex, buffer, 100);
if(reti==0)
printf("regex compile success!\n");
else
printf("regcomp() failed with '%s'\n", buffer);
//match the input line
regmatch_t input_matchptr[6];
reti = regexec(®ex,iline.c_str(),6,input_matchptr,0);
regerror(reti, ®ex, buffer, 100);
if(reti==0)
printf("regex compile success!\n");
else
printf("regexec() failed with '%s'\n", buffer);
//******************************************
return 0;
I have also modified my regex to comply with POSIX (I think?) by removing the previous use of +?
and *?
operators as per this post but may have missed something that makes me incompatible with POSIX? However, the regex now seems to compile correctly which makes me thing I used a valid regex but I still don't understand why no match is obtained. Which I understand that LLVM requires.
How can I modify my regex to correctly match?
Upvotes: 1
Views: 183
Reputation: 627302
To answer the immediate question, you need to use
string matchfile="^(0+\\.[0-9]?e?[+-]?[0-9]+)[[:space:]]+([0-9]+\\.?[0-9]*|[0-9][0-9]?[0-9]?[A-Z][a-z]?)[[:space:]]+([0-9][0-9]?[0-9]?)[[:space:]]+(\\[.*\\])[[:space:]]+(\\[.*\\])$";
That is, instead of Perl-like \s
, you can use [:space:]
POSIX character class inside a bracket expression.
You mention that you tried [:space:]
outside of a bracket expression, and it did not work - that is expected. As per Character Classes,
[:digit:]
is a POSIX character class, used inside a bracket expression like[x-z[:digit:]]
.
This means that POSIX character classes are only parse as such when used inside bracket expressions.
Upvotes: 1