krishna mohan
krishna mohan

Reputation: 49

Unable to convert php regex to C# regex

I want to count links which are having special symbol (underscore) . I have written regex its working fine in an online editor/php editor, but not working in C# code:

<
  (?<Tag_Name>(a)|img)\b
  [^>]*?
  \b(?<URL_Type>(?(2)href|src))
  \s*=\s*
  (?:"(?<URL>(?:\\.|[^\\"_#?&]++)*(?:_|(?<Query>[#?&]))(?:\\.|[^"\\]++)*)"
  |  '(?<URL>(?:\\.|[^\\'_#?&]++)*(?:_|(?<Query>[#?&]))(?:\\.|[^'\\]++)*)')

but in C# code its giving compilation error

MatchCollection underscoreLinks = Regex.Matches(strIn, "<(?<Tag_Name>(a)|img)\b[^>]*?\b(?<URL_Type>(?(2)href|src)) \s*=\s*(?:"(?<URL>(?:\\.|[^\\"_#?&]++)*(?:_|(?<Query>[#?&]))(?:\\.|[^"\\]++)*)"|  '(?<URL>(?:\\.|[^\\'_#?&]++)*(?:_|(?<Query>[#?&]))(?:\\.|[^'\\]++)*)')", RegexOptions.IgnoreCase | RegexOptions.Multiline);

Upvotes: 1

Views: 989

Answers (1)

Mariano
Mariano

Reputation: 6511

There are some things you need to correct:

  1. You're using single backslashes, which are parsed by the .net interpreter before they're passed to regex. Use a verbatim string instead, ie: @"pattern"
  2. You have unescaped quotes in your string. To escape them in a verbatim string, use 2 double quotes: @"the ""pattern"" with quotes".
  3. does not support possessive quantifiers. Use an atomic group instead. i.e: change [^\\"_#?&]++ to (?>[^\\"_#?&]+).
  4. You can use the same multiline syntax, ignoring whitespace, using RegexOptions.IgnorePatternWhitespace.

string pattern = @"
    <
      (?<Tag_Name>(a)|img)\b
      [^>]*?
      \b(?<URL_Type>(?(2)href|src))
      \s*=\s*
      (?:""(?<URL>(?>\\.|[^\\""_#?&]+)*(?:_|(?<Query>[#?&]))(?>\\.|[^""\\]+)*)""
      |  '(?<URL>(?>\\.|[^\\'_#?&]+)*(?:_|(?<Query>[#?&]))(?>\\.|[^'\\]+)*)')
    ";

Regex re = new Regex( pattern, 
                      RegexOptions.IgnoreCase | RegexOptions.Multiline
                      | RegexOptions.IgnorePatternWhitespace);

MatchCollection underscoreLinks = re.Matches(text);

ideone demo

Upvotes: 3

Related Questions