osm
osm

Reputation: 4218

Parsing content in html tags using regex

I want to parse content from

<td>content</td>
and
<td *?*>content</td>
and 
<td *specific td class*>content</td>

How can i make this with regex, php and preg match?

Upvotes: 0

Views: 935

Answers (4)

ghostdog74
ghostdog74

Reputation: 342313

@OP, here's one way

$str = <<<A
<td>content</td>
<td *?*>content</td>
<td *specific td class*>content</td>
<td *?*> multiline
content </td>
A;

$s = explode("</td>",$str);
foreach ($s as $a=>$b){
    $b=preg_replace("/.*<td.*>/","",$b);
    print $b."\n";
}

output

$ php test.php
content

content

content

 multiline
content

Upvotes: 0

Emil Vikstr&#246;m
Emil Vikstr&#246;m

Reputation: 91902

I think this sums it up pretty good.

In short, don't use regular expressions to parse HTML. Instead, look at the DOM classes and especially DOMDocument::loadHTML

Upvotes: 4

yu_sha
yu_sha

Reputation: 4410

<td>content</td>: <td>([^<]*)</td>

<td *specific td class*>content</td>: <td[^>]*class=\"specific_class\"[^>]*>([^<]*)<

Upvotes: 0

Pascal MARTIN
Pascal MARTIN

Reputation: 400932

If you have an HTML document, you really shouldn't use regular expressions to parse it : HTML is just not "regular" enough for that.

A far better solution would be to load your HTML document using a DOM parser -- for instance, DOMDocument::loadHTML and Xpath queries often do a really great job !

Upvotes: 3

Related Questions