simple
simple

Reputation: 11743

REGEX (.*) and newline

How can I fix this?

REGEX:
//REGEX
$match_expression = '/Rt..tt<\/td> <td>(.*)<\/td>/';
preg_match($match_expression,$text,$matches1);
$final =  $matches1[1];       


//THIS IS WORKING
<tr> <td class="rowhead vtop">Rtštt</td> <td><img border=0 src="http://somephoto"><br /> <br />INFO INFO INFO</td>
</tr> 


//THIS IS NOT WORKING
<tr> <td class="rowhead vtop">Rtštt</td> <td> <br />
IFNO<br />
INFO<br /></td></tr>

Upvotes: 1

Views: 1388

Answers (3)

ghostdog74
ghostdog74

Reputation: 342649

$s = explode('</tr>',$str);
foreach($s as $v){
 $m=strpos($v,"img border");
 if($m!==FALSE){
    print substr($v,$m);
 }
}

Upvotes: 0

Andrew Moore
Andrew Moore

Reputation: 95424

And this is exactly why you shouldn't be using Regular Expressions to extract data from an HTML document.

The markup structure is so arbitrary that it is simply too unreliable, which is exactly why I won't give you a proper regular expression to use because there is none (the solutions given by other users might work... until they break). Use a DOM Parser like DOMDocument or phpQuery to extract data from your document.

Here is an example using phpQuery:

$pq = phpQuery::newDocumentFile('somefile.html');
$rows = $pq->find('td.rowhead.vtop:parent');

$matches = array();

foreach($rows as $row) {
  $matches[] = $row->eq(1)->html();
}

Upvotes: 5

Turtle
Turtle

Reputation: 1350

You're doing it wrong!

Having said that, a solution to your question is:

/Rt..tt<\/td> <td>(.*)<\/td>/

should be

/Rt..tt<\/td> <td>(.*)<\/td>/s

see http://php.net/manual/en/reference.pcre.pattern.modifiers.php

Upvotes: 3

Related Questions