Dancrumb
Dancrumb

Reputation: 27539

How do I ensure that a regex does not match an empty string?

I'm using the Jison parser generator for Javascript and am having problems with my language specification.

The program I'm writing will be a calculator that can handle feet, inches and sixteenths. In order to do this, I have the following specification:

%%
([0-9]+\s*"'")?\s*([0-9]+\s*"\"")?\s*([0-9]+\s*"s")? {return 'FIS';}
[0-9]+("."[0-9]+)?\b  {return 'NUMBER';}
\s+                   {/* skip whitespace */}
"*"                   {return '*';}
"/"                   {return '/';}
"-"                   {return '-';}
"+"                   {return '+';}
"("                   {return '(';}
")"                   {return ')';}
<<EOF>>               {return 'EOF';}

Most of these lines come from a basic calculator specification. I simply added the first line.

The regex correctly matches feet, inch, sixteenths, such as 6'4" (six feet, 4 inches) or 4"5s (4 inches, 5 sixteenths) with any kind of whitespace between the numbers and indicators.

The problem is that the regex also matches a null string. As a result, the lexical analysis always records a FIS at the start of the line and then the parsing fails.

Here is my question: is there a way to modify this regex to guarantee that it will only match a non-zero length string?

EDIT Although the regex has capturing groups in it, I do not need to capture those groups. I know I could use non-capturing groups, but it's a little clearer without the (?:...).

Upvotes: 1

Views: 717

Answers (2)

tiftik
tiftik

Reputation: 988

You can add (?=.) at the beginning of your regex.

Upvotes: 1

Jon
Jon

Reputation: 16728

The problem is that everything in your first line is optional - either ? (0 or 1) or * (0 or more).

I'm not too familiar with the imperial system (I've never seen sixteenths before...), but perhaps something like

([0-9]+\s*["'s])+    (with whatever escaping is necessary for the " and ' - I'm not a javascript guy)

This definitely ensures that it doesn't match an empty string, the problem with this is it would allow something like 5s 4" 6', which is probably not quite what you want...

Upvotes: 0

Related Questions