Reputation: 1949
I have the following code snippet which essentially parses my blog site and store some information as variables:
global $articles;
$items = $html->find('div[class=blogpost]');
foreach($items as $post) {
$articles[] = array($post->children(0)->innertext,
$post->children(1)->first_child()->outertext);
}
foreach($articles as $item) {
echo $item[0];
echo $item[1];
echo "<br>";
}
The above code outputs as follows:
Title of blog post 1 <script type="text/javascript">execute_function(3,'')</script><a href="http://www.example.com/cool_news" id="963" target="_blank" >Click here for news</a> <img src="/news.gif" width="12" height="12" title="validated" /><span class="title">
Title of blog post 2 <script type="text/javascript">execute_function(3,'')</script><a href="http://www.example.com/neato" id="963" target="_blank" >Click here for neato</a> <img src="/news.gif" width="12" height="12" title="validated" /><span class="title">
Title of blog post 3 <script type="text/javascript">execute_function(3,'')</script><a href="http://www.example.com/lame" id="963" target="_blank" >Click here for lame</a> <img src="/news.gif" width="12" height="12" title="validated" /><span class="title">
with $item[0] containing "Title of blog post X" and $item[1] containing the rest.
What I want to do is parse $item[1] and retain only the URL contained within it as a separate variable. Perhaps I am not phrasing my question correctly, but I cannot find anything that can help me figure this out.
Can anyone help me?
Upvotes: 0
Views: 301
Reputation: 7808
If you were to parse $item[1]
into whatever DOM crawler object you were using for $html
, you could use the following XPath
$item[1]->find('//a[0]/@href');
which will return
href="http://www.example.com/cool_news"
Then extract the url however you want, with PHP or refine the XPath query. Not sure what the XPath would be to get the value, perhaps someone might be able to expand on that one.
EDIT: Seeing as you using Simple DOM Parser, try the following
$blogItemHtml = new simple_html_dom();
$blogItemHtml->load($item[1]);
$anchors = $blogItemHtml->find('a');
echo $anchors[0]->href; // "http://www.example.com/cool_news"
Upvotes: 2