Reputation: 121
I'm trying to webscrape text from this site I want to scrape aaa-a.nl
, abcinkt.nl
, accudeals.nl
etc..
Those urls are from the <ul class="members members-list clearfix">
class and are inside <li></li>
.
How do I webscrape those in PHP?
Upvotes: 0
Views: 92
Reputation: 5483
Let's say you have already read (CURL) the file into a variable $html
. You can then follow the following procedure to extract the required element:
$doc = new DOMDocument();
$doc->loadHTML($html);
$sxml = simplexml_import_dom($doc);
if (!$sxml) {
echo "ERROR. Do something to handle this.\n";
}
$node = $sxml->xpath("//ul[contains(concat(' ', normalize-space(@class), ' '), 'members-list')]");
foreach($nodes[0]->li as $member) {
echo (string)$member->a; // This will echo the strings you need
}
*Not tested.
(To understand the xpath query in the above code, see this: Getting DOM elements by classname )
Here I'm using DOMDocument and SimpleXml. You can do this by several other ways, say, by using DOMDocument class alone to navigate the DOM, or using DOMDocument with DOMXPath, or maybe even by just using Php string functions and regex.
Upvotes: 1