Reputation: 1477
I have html like this:
<ul id="video-tags">
<li><em>Tagged: </em></li>
<li><a href="/tags/sports">sports</a>, </li>
<li><a href="/tags/entertain">entertain</a>, </li>
<li><a href="/tags/funny">funny</a>, </li>
<li><a href="/tags/comedy">comedy</a>, </li>
<li><a href="/tags/automobile">automobile</a>, </li>
<li>more <a href="/tags/"><strong>tags</strong></a>.</li>
</ul>
How can I extract the sports, entertain, funny, comedy, automobile into string
my php preg_match_all look like this:
preg_match_all('/<a href\="\/tags\/(.*?)\">(.*?)<\/a>, <\/li>/', $this->page, $matches);
echo var_dump($matches);
echo implode(' ', $tags);
It does not work.
Upvotes: 4
Views: 7583
Reputation: 57690
This small regex does the same thing too.
preg_match_all('|tags/[^>]*>([^<]*)|', $str, $matches);
Also using DOMDocuemnt.
$d = new DOMDocument();
$d->loadHTML($str);
$as = $d->getElementsByTagName('a');
$result = array();
for($i=0;$i<($as->length-1); $i++)
$result[]=$as->item($i)->textContent;
echo implode(' ', $result);
Upvotes: 2
Reputation: 4367
I'm not sure how you're getting $this->page
from, however the following should work as you're expecting:
<?php
$page = 'subject string ...';
preg_match_all('/<a href\="\/tags\/(.*?)\">(.*?)<\/a>, <\/li>/', $page, $matches);
echo implode(', ', $matches[1]);
?>
Substitute the $page
variable for your $this->page
so long as it is still a string.
However, I'd suggest not trying to parse HTML with Regular Expressions. Instead, use a library like PHP DOM document or SimpleHTMLdom to properly parse HTML.
Upvotes: 4
Reputation: 23759
This worked perfectly for me:
preg_match_all('/<a href\="\/tags\/(.*?)\">.*?<\/a>, <\/li>/', $str, $matches);
echo implode(',', $matches[1]);
Prints: sports,entertain,funny,comedy,automobile
$this->page is probably empty, that's why you are not getting any data.
Why do you put the brackets twice in regexp? You have the same words both in url and text of the link.
Upvotes: 1