Richard Walton
Richard Walton

Reputation: 4785

Regex index in matching string where the match failed

I am wondering if it is possible to extract the index position in a given string where a Regex failed when trying to match it?

For example, if my regex was "abc" and I tried to match that with "abd" the match would fail at index 2.

Edit for clarification. The reason I need this is to allow me to simplify the parsing component of my application. The application is an Assmebly language teaching tool which allows students to write, compile, and execute assembly like programs.

Currently I have a tokenizer class which converts input strings into Tokens using regex's. This works very well. For example:

The tokenizer would produce the following tokens given the following input = "INP :x:"
:

Token.OPCODE, Token.WHITESPACE, Token.LABEL, Token.EOL

These tokens are then analysed to ensure they conform to a syntax for a given statement. Currently this is done using IF statements and is proving cumbersome. The upside of this approach is that I can provide detailed error messages. I.E

if(token[2] != Token.LABEL) { throw new SyntaxError("Expected label");}

I want to use a regular expression to define a syntax instead of the annoying IF statements. But in doing so I lose the ability to return detailed error reports. I therefore would at least like to inform the user of WHERE the error occurred.

Upvotes: 7

Views: 4303

Answers (5)

Tono Nam
Tono Nam

Reputation: 36058

It is not possible to be able to tell where a regex fails. as a result you need to take a different approach. You need to compare strings. Use a regex to remove all the things that could vary and compare it with the string that you know it does not change.

I run into the same problem came up to your answer and had to work out my own solution. Here it is:

https://stackoverflow.com/a/11730035/637142

hope it helps

Upvotes: 0

Michael Carman
Michael Carman

Reputation: 30831

In order to do that you would need either callbacks embedded in the regex (which AFAIK C# doesn't support) or preferably hooks into the regex engine. Even then, it's not clear what result you would want if backtracking was involved.

Upvotes: 0

torial
torial

Reputation: 13121

I agree with Colin Younger, I don't think it is possible with the existing Regex class. However, I think it is doable if you are willing to sweat a little:

  1. Get the Regex class source code (e.g. http://www.codeplex.com/NetMassDownloader to download the .Net source).
  2. Change the code to have a readonly property with the failure index.
  3. Make sure your code uses that Regex rather than Microsoft's.

Upvotes: 4

ColinYounger
ColinYounger

Reputation: 6865

I don't believe it's possible, but I am intrigued why you would want it.

Upvotes: 0

Massimiliano
Massimiliano

Reputation: 16980

I guess such an index would only have meaning in some simple case, like in your example.

If you'll take a regex like "ab*c*z" (where by * I mean any character) and a string "abbbcbbcdd", what should be the index, you are talking about? It will depend on the algorithm used for mathcing... Could fail on "abbbc..." or on "abbbcbbc..."

Upvotes: 1

Related Questions