Reputation: 1133
I'm using cURL to grab a page and I want to parse out the title of the post (the actual text shown on the link, not the title attribute of the <a>
).
The HTML is like this:
<li class="topic">
<a title="Permanent Link to Blog Post" rel="bookmark" href="http://www.website.com/blog-post/">Title of blog post</a>
</li>
I tried using this code:
preg_match('/<\a title=\".*\" rel=\"bookmark\" href=\".*\">.*<\/a>/', $page, $matches);
But it's not working, PHP returns Array ( )
(an empty array).
Can anyone supply me the regex to do this? I've tried online generators but it goes right over my head. Cheers!
Upvotes: 1
Views: 391
Reputation: 342303
here's another way
$str = <<<A
<li class="topic">
<a title="Permanent Link to Blog Post" rel="bookmark" href="http://www.website.com/blog-post/">Title of blog post</a>
</li>
A;
$s = explode("</a>",$str);
foreach ($s as $a=>$b){
if(strpos($b,"<a title")!==FALSE){
$b=preg_replace("/.*<a title.*>/ms","",$b);
print $b;
}
}
output
$ php test.php
Title of blog post
Upvotes: 0
Reputation: 6896
$str =
'<li class="topic">
<a title="Permanent Link to Blog Post"
rel="bookmark" href="http://www.website.com/blog-post/">
Title of blog post</a>
</li>
;
`
echo strip_tags( $str ) ;
Gives:
Title of blog post
Upvotes: 0
Reputation: 816312
Add parenthesis to your expression:
'/<a title=".*" rel="bookmark" href=".*">(.*)<\/a>/'
Everything between (
)
will be returned in the array.
Edit:
You have to remove all the backspaces before the quotation marks.
Edit2:
Just seen in the documentation for preg_match
If
matches
is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches1 will have the text that matched the first captured parenthesized subpattern, and so on.
You should also test your expression with sample text to make sure that it really does what you want to do.
Upvotes: 1
Reputation: 139431
Assuming you want the attribute, you could use:
if (preg_match('/<a\s+[^>]*?\btitle="(.+?)"/', $page, $matches)) {
echo $matches[1], "\n";
}
Parsing HTML can be tricky, and regular expressions aren't up to the job in the general case. For simple, sane documents, you can get away with it.
Just be aware that you're driving a screw with a hammer.
Upvotes: 0