jessica
jessica

Reputation: 1687

Links that doesn't look like links

I have some codes that gets all the links of a page, but some were getting links that doesn't look like links. For example, indexes 0-4 was getting links called "javascript:void(0)", and index 5 was getting a blank link with just a "/". How do I fix this? Thanks.

$content = file_get_contents("http://bestspace.co"); //get content of page

$links = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>"; //set regular expression to get links
preg_match_all("/$links/siU", $content, $matches); //get all links on page and store in array $matches[2]

print_r($matches[2]);

contents of array

Array ( 

[0] => javascript:void(0) 
[1] => javascript:void(0) 
[2] => javascript:void(0) 
[3] => javascript:void(0) 
[4] => javascript:void(0) 
[5] => / 
[6] => /bestdeals 
[7] => /about-us 
[8] => /why-choose-us 
[9] => /products 
[10] => https://cloud.bestspace.co/clientarea.php 

ect... );

Upvotes: 0

Views: 50

Answers (1)

Barmar
Barmar

Reputation: 781716

Use array_filter to remove all the Javascript links.

$links = array_filter($matches[2], function($x) {
    return substr($x, 0, 11) != 'javascript:';
});

Upvotes: 2

Related Questions