Umair Ahmed
Umair Ahmed

Reputation: 2423

Porting from C# to Delphi, Regex incompatibility

I am new to HTTP and Regex. I have a piece of code which I have ported to Delphi which works partially. The exception 'lookbehind not of fixed length' is raised on a particular statement:

'(?<=image\\?c=)[^\"]+'

The statement is there to extract image link from a html form. After some research here and on the web, I have come to understand that the '+' at the end causes this in some implementations of Regex. Which I couldn't find was how can I change it to work in Delphi's implementation. As the code works in C#, can somebody help and explain?

Upvotes: 0

Views: 209

Answers (1)

Rob Kennedy
Rob Kennedy

Reputation: 163357

The lookbehind section doesn't have fixed length. That has nothing to do with the + at the end. The lookbehind portion is (?<=image\\?c=). You copied that from C#. In C#, the regex wants to look for a literal question mark. That's a special character in regex, so it needs a backslash in front of it. Backslash is special in C# strings, though, so that backslash needs another backslash, all just to represent a single question mark.

In Delphi strings, backslashes aren't special, so the two of them are treated as a literal backslash to search for in the regex. The question mark isn't escaped, so the Delphi regex treats it as an instruction to make the literal backslash optional. The optional character makes the lookbehind have variable length.

To solve this, simply remove one backslash.

You can also remove the one before the quotation mark, but it should have no effect since quotation marks aren't special in regex.

Even if you use an HTML parser to identify HTML element that contains this URL fragment, you may still need the right regex to recognize which HTML element is your target.

Upvotes: 4

Related Questions