Reputation: 11743
How can I fix this?
REGEX:
//REGEX
$match_expression = '/Rt..tt<\/td> <td>(.*)<\/td>/';
preg_match($match_expression,$text,$matches1);
$final = $matches1[1];
//THIS IS WORKING
<tr> <td class="rowhead vtop">Rtštt</td> <td><img border=0 src="http://somephoto"><br /> <br />INFO INFO INFO</td>
</tr>
//THIS IS NOT WORKING
<tr> <td class="rowhead vtop">Rtštt</td> <td> <br />
IFNO<br />
INFO<br /></td></tr>
Upvotes: 1
Views: 1388
Reputation: 342649
$s = explode('</tr>',$str);
foreach($s as $v){
$m=strpos($v,"img border");
if($m!==FALSE){
print substr($v,$m);
}
}
Upvotes: 0
Reputation: 95424
And this is exactly why you shouldn't be using Regular Expressions to extract data from an HTML document.
The markup structure is so arbitrary that it is simply too unreliable, which is exactly why I won't give you a proper regular expression to use because there is none (the solutions given by other users might work... until they break). Use a DOM Parser like DOMDocument or phpQuery to extract data from your document.
Here is an example using phpQuery:
$pq = phpQuery::newDocumentFile('somefile.html');
$rows = $pq->find('td.rowhead.vtop:parent');
$matches = array();
foreach($rows as $row) {
$matches[] = $row->eq(1)->html();
}
Upvotes: 5
Reputation: 1350
Having said that, a solution to your question is:
/Rt..tt<\/td> <td>(.*)<\/td>/
should be
/Rt..tt<\/td> <td>(.*)<\/td>/s
see http://php.net/manual/en/reference.pcre.pattern.modifiers.php
Upvotes: 3