Reputation: 98
This is my regular expression:
$pattern_new="/<td>(\n|\s)*?(<span(\n|\s|.)*?<\/strong>(\n|\s)*?\$(?<price>([0-9.]*)).*?)\$(.*?)(\n|\s)*?</";
This is the sample pattern from which I have to do a match:
<td><strong>.zx</strong></td><td><span class="offer"><strong>xscre:<br></strong>$299 xxxxx&x;xx<span class="fineprint_number">2</span></span><br>de&ea;s $399</td><td>zxcddcdcdcdc</td></tr><tr class="dark"><td><strong>.aa.rr</strong></td><td><span class="offer"><strong>xscre:<br></strong>$99 xxxxx&x;xx<span class="fineprint_number">2</span></span><br>de&eae;s $199</td><td>xxxx</td></tr><tr class="bar"><td colspan="3"></td></tr><tr class="bright"><td><strong>.vfd</strong></td><td><span class="offer"><strong>xscre:<br></strong>$99 xxxxx&x;xx<span class="fineprint_number">2</span></span><br>duⅇs $199</td><td>xxxxxxxx</td></tr><tr class="dark"><td><strong>.qwe</strong></td><td><span class="offer"><strong>xxx<br></strong>$99 xxxc;o<span class="fineprint_number">2</span>
Here is what I am doing in PHP
$pattern_new="/<td>(\n|\s)*?(<span(\n|\s|.)*?<\/strong>(\n|\s)*?\$(<price>)*([0-9.]*).*?)\$(.*?)(\n|\s)*?</";
$source = file_get_contents("https://www.abc.com/sources/data.txt");
preg_match_all($pattern_new, $source, $match_newprice, PREG_PATTERN_ORDER);
echo$source;
print_r($match_newprice);
the$match_newprice
is returning an empty array.
When I am using a regex tester like myregextester or solmetra.com I am getting a perfect match no issues at all but when I am using php preg_match_all
to do the match it is returning an empty array. I increased the pcre.backtrack_limit but its still the same issue.
I don't seem to understand the problem. Any help would be much appreciated.
Upvotes: 1
Views: 297
Reputation: 89557
The good way to do that:
$oProductsHTML = new DOMDocument();
@$oProductsHTML->loadHTML($sHtml);
$oSpanNodes = $oProductsHTML->getElementsByTagName('span');
foreach ($oSpanNodes as $oSpanNode) {
if (preg_match('~\boffer\b~', $oSpanNode->getAttribute('class')) &&
preg_match('~\$\K\d++~', $oSpanNode->nodeValue, $aMatch) )
{
$sPrice = $aMatch[0];
echo '<br/>' . $sPrice;
}
}
$sHtml
stands for your string.
And i'm sure you can make it shorter with XPath.
The bad way:
$sPattern = '~<span class="offer\b(?>[^>]++|>(?!\$))+>\$\K\d++~';
preg_match_all($sPattern, $sHtml, $aMatches);
print_r ($aMatches[0]);
Notice: \d++
can be replaced by \d++(?>\.\d++)?
to allow decimal numbers.
Upvotes: 1
Reputation: 70490
Another problem which is PHP related with this:
<?php
echo "\$".PHP_EOL;
echo '\$'.PHP_EOL;
Result:
$
\$
... as in double quoted strings the $
is expected to signify the start of a variable, and needs escaping if you mean a bare $
. Put single quotes around your regex & it will probably be fine (haven't looked at in detail though, you may want to use the /x
option & add some formatting whitespace/comments if you need to debug this a half year from now).
Upvotes: 1
Reputation: 19076
I assume you were trying to do a noncapture group for <price...
but you missed the :
. Or you should take out the question mark. If the price
group is optional, try like the regex below. You should use the following website to help you with regex. I find it extremely helpful.
<td>(\n|\s)*?(<span(\n|\s|.)*?<\/strong>(\n|\s)*?\$(<price>)*([0-9.]*).*?)\$(.*?)(\n|\s)*?<
In the above example, your first match would have the following captures:
0: "<td><span class="offer"><strong>xscre:<br></strong>$299 xxxxx&x;xx<span class="fineprint_number">2</span></span><br>de&ea;s $399<"
1: ""
2: "<span class="offer"><strong>xscre:<br></strong>$299 xxxxx&x;xx<span class="fineprint_number">2</span></span><br>de&ea;s "
3: ">"
4: ""
5: ""
6: "299"
7: "399"
8: ""
Is this what you are looking for?
Upvotes: 2