em202020
em202020

Reputation: 59

file_get_contents() suddenly not working

This is all of my code:

<html>
<body>
<form>
Playlist to Scrape: <input type="text" name="url" placeholder="Playlist URL">
<input type="submit">
</form>


<?php

if(isset($_GET['url'])){

        $source = file_get_contents($_GET['url']);
        $regex = '/<a href="(.*?)" class="gothere pl-button" title="/';

        preg_match_all($regex,$source,$output);
        echo "<textarea cols=100 rows=50>";
        $fullUrl = array();
        foreach($output[1] as $url){
                array_push($fullUrl,"http://soundcloud.com".$url);
        }
        $final = implode(";",$fullUrl);
        echo $final;
        echo "</textarea>";
}else{
        echo "borks";
}


?>
</body>
</html>

Yesterday, it worked fine. What the code should do is: Take a Soundcloud URL, extract the individual songs, and then print them like song1;song2;song3

Again, this worked fine yesterday, and I haven't changed anything since, I think...

I have tried to comment the other code out, and just keeping $source = file_get_contents($_GET['url']); and echoing $source, but it returned blank, which makes me think it is a problem with file_get_contents.

If you have any idea on why this is happening, I would appreciate hearing it. Thanks!

Upvotes: 2

Views: 5544

Answers (3)

Cibo FATA8
Cibo FATA8

Reputation: 101

In my case (I was also frequently downloading one page but not soundcloud) it was because of F5 “bobcmn” Javascript detection at server.

When I wrote into my php script somethinkg like var_dump($source); - to see what server sent - then I saw that response starts with this code: window[“bobcmn”] = ...

More here: https://blog.dotnetframework.org/2017/10/10/understanding-f5-bobcmn-javascript-detection/

Upvotes: 0

Christiaan Westerbeek
Christiaan Westerbeek

Reputation: 11157

What might have happened is that a new SSL certificate was installed on the server that file_get_contents is trying to access. In our case, the target server had a new SSL certificate installed on its domain from another vendor and another wild-card domain.

Changing our config a little bit fixed the problem.

 $opts = array(
   'http' => array(
     'method' => "GET",
     'header' => "Content-Type: application/json\r\n".
                 "Accept: application/json\r\n",
     'ignore_errors' => true
   ),
   // VVVVV   The extra config that fixed it
   'ssl' => array(
     'verify_peer' => false,
     'verify_peer_name' => false,
   )
   // ^^^^^
 );
 $context = stream_context_create($opts);
 $result = file_get_contents(THE_URL_WITH_A_CHANGED_CERTIFICATE, false, $context);

I found this solution thanks to this answer. It even was downvoted.

This certainly explained the fact that file_get_contents suddenly stops working.

Upvotes: 4

Alana Storm
Alana Storm

Reputation: 166126

Your question doesn't have enough information for someone to help you.

To start with though, I would

  • Check that the script is receiving the URL get parameter correctly (var_dump($_GET['url']))
  • Check what PHP fetches from the URL (var_dump(file_get_contents($_GET['url']));

My guess is either your server admin turned off FOPEN URL wrappers, or the owner of the site you're scraping decided they didn't want you scraping their site, and are blocking requests from your PHP scripts.

It also helps to turn error reporting all the way up, and set display errors to 1

error_reporting(E_ALL);
ini_set('display_errors', 1);

Although if you've been developing without this, chances are there's lots of working-but-warning-worthy code in your application.

Good luck.

Upvotes: 1

Related Questions