Reputation: 1
I'm trying to use (.+?)
to isolate the words "I. NEED. ISOLATION" in the source below:
<strong>Label:</strong></font></td>
<td valign="top" width="82%"> <font face="Arial" size="2">
I. NEED. ISOLATION </font> </td>
using (.+?)
, I could do this:
$regex = '/stuff before(.+?)stuff after/';
and for this html, that would be:
$regex = '/<strong>Label:</strong></font></td>
<td valign="top" width="82%"> <font face="Arial" size="2">
(.+?) </font> </td>/';
but it's choking up on it because of incorrect escaping. I'm not great in PHP. Can someone please advise which characters I should also escape based on html that looks like this?
<strong>Label:</strong></font></td>
<td valign="top" width="82%"> <font face="Arial" size="2">
I. NEED. ISOLATION </font> </td>
Note that I'm not trying to design a regex pattern. I already have the pattern nailed down with (.+?)
, just need to know how to correctly escape the html so that php doesn't choke up on it.
Upvotes: 0
Views: 119
Reputation: 75232
As a matter of fact, there's nothing in that string that has special meaning in a regex (except the (.+?)
, of course). The only reason the /
is causing a problem is because you're using it as the regex delimiter. You just need to choose a different delimiter, like ~
for example:
$regex = '~<strong>Label:</strong></font></td>
<td valign="top" width="82%"> <font face="Arial" size="2">
(.+?) </font> </td>~';
Upvotes: 0
Reputation: 17817
There is a funciton that does that for you. It's named preg_quote http://pl2.php.net/preg_quote
$regex = '/'.preg_quote('<strong>Label:</strong></font></td>
<td valign="top" width="82%"> <font face="Arial" size="2">
').'(.+?)'.preg_quote(' </font> </td>).'/';
You should also be careful with case sensitivity and line breaks. I often tend to add flags to my regexps to deal with it so they look like /(.+?)/is
Upvotes: 0
Reputation: 342393
$str=<<<EOF
<strong>Label:</strong></font></td>
<td valign="top" width="82%"> <font face="Arial" size="2">
I. NEED. ISOLATION </font> </td>
EOF;
$s = explode("</font>",$str);
foreach($s as $k=>$v){
if(strpos($v,'<font face="Arial" size="2">')){
$t=explode('<font face="Arial" size="2">',$v);
print trim($t[1])."\n";
}
}
output
$ php test.php
I. NEED. ISOLATION
Upvotes: 0
Reputation: 401022
First of all, you should really not use regular expressions to try to "parse" HTML -- which is not quite regular.
Going with something like DOMDocument::loadHTML
and some XPath query is generally a much better solution.
For instance, you could use a #
as regex delimiter :
$str = <<<STR
<strong>Label:</strong></font></td>
<td valign="top" width="82%"> <font face="Arial" size="2">
I. NEED. ISOLATION </font> </td>
STR;
$regex = '#<strong>Label:</strong></font></td>
<td valign="top" width="82%"> <font face="Arial" size="2">
(.+?) </font> </td>#';
if (preg_match($regex, $str, $m)) {
var_dump($m[1]);
}
Will get you :
string 'I. NEED. ISOLATION' (length=18)
Note the only thing I changed compared to your proposed code is the regex delimiter ;-)
Upvotes: 2
Reputation: 655269
If you’re using PCRE regular expressions, you need to escape the delimiters inside the regular expression (in your case the /
):
'/<strong>Label:<\/strong><\/font><\/td>
<td valign="top" width="82%"> <font face="Arial" size="2">
(.+?) <\/font> <\/td>/'
But probably more important: Regular expressions are not suitable for parsing HTML. Better use a proper HTML parser like the one provided by DOMDocument and query it with DOMXPath.
Upvotes: 0
Reputation: 526643
See this previous StackOverflow question.
That said, the escaping issue is due to the /
characters within, which are confusing the regex parser since you're using /
es already to delimit the regex.
Upvotes: 3