Reputation: 57670
I have this huge RegEx for matching credit cards numbers. But its PCRE. Works flawlessly in PHP.
/(\d{13,16})(?(?=<)<|["']).*?(?=(?(?=>)>|["\'])\d{3,4}(?(?=<)<|["']))(?(?=>)>|["'])(\d{3,4})(?(?=<)<|["'])/is
// /i = ignore case
// /s = treat the subject as a single line
I convert it to .NET. Just added @
at the beginning and double the double quotes. I think its the proper procedure.
@"(\d{13,16})(?(?=<)<|[""]).*?(?=(?(?=>)>|[""])\d{3,4}(?(?=<)<|[""]))(?(?=>)>|[""])(\d{3,4})(?(?=<)<|[""])"
Now it doesn't match. I know PCRE and .NET implementation might not be same. But I think I can convert it to compatible one. I look up on MSDN reference. It seems my pattern has nothing special which could be PCRE specific.
After analyzing the pattern I found the (?(?=<)<|[""])
is not matching!. So made the regular expression simpler. Its now @"(?(?=q)qu|\w)\w+"
. And I am matching against "Queen, Quick, Qi etc"
$data = "Queen, Quick, Qi etc";
$pattern = "(?(?=q)qu|\w)\w+";
preg_match_all("/$pattern/is", $data, $matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => Queen
[1] => Quick
[2] => etc
)
)
string data = "Queen, Quick, Qi etc";
string pattern = @"(?(?=q)qu|\w)\w+";
Regex re = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
foreach (Match m in re.Matches(data))
{
if (m.Success)
{
//Console.WriteLine("Credit Card Number={0}, CCV={1}", m.Groups[1].Value, m.Groups[6].Value);
for (int i = 1; i < m.Groups.Count; i++)
{
Console.WriteLine("[{0}][{1}]", i, m.Groups[i].Value);
for (int j = 0; j < m.Groups[i].Captures.Count; j++)
{
Console.WriteLine("[{0}][{1}][{2}]", i, m.Groups[i].Value, m.Groups[i].Captures[j].Value);
}
}
}
}
Press any key to continue . . .
Output is nothing.
My questions are
@"(?(?=q)qu|\w)\w+"
so that it matches just like PHP in .NET?Thanks
Upvotes: 1
Views: 491
Reputation: 336418
1.: Conditionals work in .NET just as they do in PHP.
2.: The "simpler" regex is correct for .NET. You're just using it wrong:
You have no capturing groups in your regex. That means that the loop
for (int i = 1; i < m.Groups.Count; i++) {...}
is never executed because m.Groups.Count
is 1.
The correct way would be something like
foreach (Match m in re.Matches(data))
{
if (m.Success)
{
for (int i = 0; i < m.Groups.Count; i++) // Groups are numbered from zero
{
// Groups[0] is the entire match
Console.WriteLine("[{0}][{1}]", i, m.Groups[i].Value);
}
}
}
3.: Your regex is missing the single quotes.
Regex regexObj = new Regex(@"(\d{13,16})(?(?=<)<|[""']).*?(?=(?(?=>)>|[""'])\d{3,4}(?(?=<)<|[""']))(?(?=>)>|[""'])(\d{3,4})(?(?=<)<|[""'])", RegexOptions.Singleline);
would be a literal translation.
4.: You don't need the /i
or Ignorecase
parameter as there are no letters in your regex.
5.: (?(?=<)<|["'])
makes no sense. It matches exactly the same text as [<"']
. After all it means "if there is a <
, then match a <
. Otherwise, try to match a "
or a '
. There is no need to use a conditional regex at all.
So the entire regex can be simplified to
(\d{13,16})[<"'].*?(?=[>"']\d{3,4}[<"'])[>"'](\d{3,4})[<"']
6.: This shows another superfluous part of the regex more clearly: You have a lookahead assertion (?=[>"']\d{3,4}[<"'])
that is followed by the exact same regex [>"'](\d{3,4})[<"']
, so the lookahead can be dropped entirely.
End result:
(\d{13,16})[<"'].*?[>"'](\d{3,4})[<"']
or, in C#:
Regex regexObj = new Regex(@"(\d{13,16})[<""'].*?[>""'](\d{3,4})[<""']", RegexOptions.Singleline);
Upvotes: 2