user3639768
user3639768

Reputation: 121

How to webscrape text inside class and element

I'm trying to webscrape text from this site I want to scrape aaa-a.nl, abcinkt.nl, accudeals.nl etc..
Those urls are from the <ul class="members members-list clearfix"> class and are inside <li></li>.
How do I webscrape those in PHP?

Upvotes: 0

Views: 92

Answers (1)

Nitin Nain
Nitin Nain

Reputation: 5483

Let's say you have already read (CURL) the file into a variable $html. You can then follow the following procedure to extract the required element:

$doc = new DOMDocument();
$doc->loadHTML($html);
$sxml = simplexml_import_dom($doc);
if (!$sxml) {
    echo "ERROR. Do something to handle this.\n";
}
$node = $sxml->xpath("//ul[contains(concat(' ', normalize-space(@class), ' '), 'members-list')]");
foreach($nodes[0]->li as $member) {
    echo (string)$member->a; // This will echo the strings you need
}

*Not tested.

(To understand the xpath query in the above code, see this: Getting DOM elements by classname )

Here I'm using DOMDocument and SimpleXml. You can do this by several other ways, say, by using DOMDocument class alone to navigate the DOM, or using DOMDocument with DOMXPath, or maybe even by just using Php string functions and regex.

Upvotes: 1

Related Questions