Reputation: 11743

REGEX (.*) and newline

How can I fix this?

REGEX:
//REGEX
$match_expression = '/Rt..tt<\/td> <td>(.*)<\/td>/';
preg_match($match_expression,$text,$matches1);
$final =  $matches1[1];       


//THIS IS WORKING
<tr> <td class="rowhead vtop">RtÅ¡tt</td> <td><img border=0 src="http://somephoto"><br /> <br />INFO INFO INFO</td>
</tr> 


//THIS IS NOT WORKING
<tr> <td class="rowhead vtop">RtÅ¡tt</td> <td> <br />
IFNO<br />
INFO<br /></td></tr>

Upvotes: 1

Answers (3)

ghostdog74

Reputation: 342649

$s = explode('</tr>',$str);
foreach($s as $v){
 $m=strpos($v,"img border");
 if($m!==FALSE){
    print substr($v,$m);
 }
}

Upvotes: 0

Andrew Moore

Reputation: 95424

And this is exactly why you shouldn't be using Regular Expressions to extract data from an HTML document.

The markup structure is so arbitrary that it is simply too unreliable, which is exactly why I won't give you a proper regular expression to use because there is none (the solutions given by other users might work... until they break). Use a DOM Parser like DOMDocument or phpQuery to extract data from your document.

Here is an example using phpQuery:

$pq = phpQuery::newDocumentFile('somefile.html');
$rows = $pq->find('td.rowhead.vtop:parent');

$matches = array();

foreach($rows as $row) {
  $matches[] = $row->eq(1)->html();
}

Upvotes: 5

Turtle

Reputation: 1350

You're doing it wrong!

Having said that, a solution to your question is:

/Rt..tt<\/td> <td>(.*)<\/td>/

should be

/Rt..tt<\/td> <td>(.*)<\/td>/s

see http://php.net/manual/en/reference.pcre.pattern.modifiers.php

Upvotes: 3

REGEX (.*) and newline

Answers (3)

Related Questions