Reputation: 184
I'm trying to use regex on Delphi to regex a HTML and get some data.
My objective is create a query string with the follow sintax:
?namedGroup1=valueNamedGroup1&namedGroup2=valueNamedGroup2
I have n Array of regex:
array[0] = '<div (id="(?<id>[a-zA-Z0-9]+)"|name="(?<name>[a-zA-Z0-9]+))"';
My html:
<h1>bla bla bla</h1> <div id="home">
If I apply this regex using the built in regex in PHP it will return an associative array
RegArray[0] = '<div id="home">'
RegArray['id'] = 'home'
if I do a foreach I easily get the list of the named groups and I can create my querystring:
?id=home
The thing is that I don't know if the regex will match the named group ID or Name and I need to know that.
Delphi only return a simple array
RegArray[0] = '<div id="home">'
RegArray[1] = 'home' // ID or NAME?
So, how do I get the named Group and the named Group Value?
here it is my code:
var RegEx: TRegEx;
begin
RegEx := TRegEx.Create(array[0], [roIgnoreCase,roMultiline]);
Match := RegEx.Match(html);
if (Match.Success) then
begin
//get the group here.
end;
I also tried this class: http://www.regular-expressions.info/delphi.html
But no success
Upvotes: 4
Views: 3185
Reputation: 2552
TRegEx
(from System.RegularExpressions
) is a wrapper around TPerlRegEx
(from System.RegularExpressionsCore
), which is a wrapper around the open source PCRE library.
PCRE of course supports retrieving the names for groups, but both wrappers don't.
Possible solutions:
System.RegularExpressionsAPI
)pcre_fullinfo(TPerlRegEx.FPattern, ...)
)JclPCRE
from the open source JEDI Code Library (JCL): Name1:= TJclRegEx.CaptureNames[1];
Upvotes: 1
Reputation: 16065
I think you made a mistake in your query: look at the last two characters of the pattern - it clearly was unbalanced! Looks like you failed to copy-paste from PHP ;-)
<div (id="(?<id>[a-zA-Z0-9]+)"|name="(?<name>[a-zA-Z0-9]+))"
<div (id="(?<id>[a-zA-Z0-9]+)"|name="(?<name>[a-zA-Z0-9]+)")
Using pcre.org engine + interactive editor from http://www.yunqa.de/delphi/doku.php/products/regex/index
I also tried this class: http://www.regular-expressions.info/delphi.html
That page immediately shows another interactive editor that could be used to debug your RegEx program: http://www.regexbuddy.com/test.html
I wonder why didn't you tried to use it...
Still i think some HTML parser would be both faster and more reliable. Consider HTML extracts like
<!-- <p><div name="bla-bla"> ... </div></p> -->
or like
<img src="...." alt='Press to insert <div id="123"> to you sample text' />
or like
<DIV ID="my cool id" />
The topic starter made his own answer below, consisting mostly of questions to me.
The problem is not the Regex,
Just count the quotes and arrows, in which order they are opened and in which they are closed, with pen and paper. You pattern is ( ... " ... ) .... "
- it is unbalanced!
is the Delphi.
Delphi the language does not have anything to do with regexps. The libraries/components can do. So that claim has no sense. You may argue that you tested broken libraries, but not the language itself.
My regex with PHP works fine,
That should mean that either you have different regex pattern in PHP (you did not copied here PHP source) or "Problem is in PHP"
Actually we did not saw neither Delphi source nor PHP source.
array[0] = '<div (id="(?<id>[a-zA-Z0-9]+)"|name="(?<name>[a-zA-Z0-9]+))"';
- is i think not correct line in neither.
So i don't think your code and patterns in PHP program and Delphi program match each other. Show quotes of the real code being used.
the thing is that DELPHI doesn't return me
<name, value>
pair for it.Also, I can't change the whole system to use a HTML parser, the regex is already working
Then you need to adapt regex to correctly parse the HTML snippets i shown above.
Upvotes: 2
Reputation: 47768
I am not sure about enumerating named groups, but you can access the group either by its index or by its name:
const
cRegEx = '<div (id="(?<id>[a-zA-Z0-9]+)"|name="(?<name>[a-zA-Z0-9]+)")';
cHtml = '<h1>bla bla bla</h1> <div id="home">';
var
group: TGroup;
match: TMatch;
regEx: TRegEx;
begin
regEx := TRegEx.Create(cRegEx, [roIgnoreCase,roMultiline]);
match := regEx.Match(cHtml);
if match.Success then begin
group := match.Groups['id'];
Assert(group.Value = 'home');
end;
end;
Upvotes: 0