Fernando Ferrari
Fernando Ferrari

Reputation: 586

Some content does not come with CURL request

i am trying to develop a spider to get data from other sites, just for academic meanings. Very well, i am trying to crawl this website: http://urlmin.com/ngz What happens if that: I can get all the data i want, but the photo's directories. Why? Because it is loaded with javascript; until here its fine. Here is the js code that loads the image elements after dom is loaded:

    var exibirImg = new ExibirImagens();
exibirImg.Imagens = [

    new ItemImagem(
        '../fotosanuncios/13886-Papucha 20074.JPG',
        '../fotosanuncios/13886-p-Papucha 20074.JPG'),

    new ItemImagem(
        '../fotosanuncios/13886-Motores Novos.JPG',
        '../fotosanuncios/13886-p-Motores Novos.JPG'),

    new ItemImagem(
        '../fotosanuncios/13886-Panther reformada5.JPG',
        '../fotosanuncios/13886-p-Panther reformada5.JPG'),

    new ItemImagem(
        '../fotosanuncios/13886-Panther reformada 2007.JPG',
        '../fotosanuncios/13886-p-Panther reformada 2007.JPG'),

];
exibirImg.PreLoad();
exibirImg.Titulo = 'Oferta A Gtr 323';
exibirImg.EscreveImagens();
exibirImg.TimeOutJs = 3500;
exibirImg.ImagemNotFound = 'imagens/ImagemNotFound.png';
exibirImg.IdImagemPrincipal = 'imagemPrincipalPF';
exibirImg.IdImagemMini = 'imagensPequenasPF';

It would be really easy, if my CURL gets the JS like above, but it doesnt. It comes like this:

var exibirImg = new ExibirImagens();
exibirImg.Imagens = [

];
exibirImg.PreLoad();
exibirImg.Titulo = 'Oferta A Gtr 323';
exibirImg.EscreveImagens();
exibirImg.TimeOutJs = 3500;
exibirImg.ImagemNotFound = 'imagens/ImagemNotFound.png';
exibirImg.IdImagemPrincipal = 'imagemPrincipalPF';
exibirImg.IdImagemMini = 'imagensPequenasPF';

exibirImg.Iniciar();

Again, the array must be loaded with AJAX or something. But the real puzzle here is that, if i turn off my browser's javascript support, the array still come with the image's directories. So the only explanation is that it came from Server Side. And question is, if it came from server side, why the hell my curl does not get it?

Thanks, hope someone can understand me.

You can check that script on the same page in the line 262

Upvotes: 0

Views: 164

Answers (1)

Andrey Volk
Andrey Volk

Reputation: 3549

Works for me:

$url = 'http://urlmin.com/ngz';

$ch = curl_init( $url );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true);

if ( $result =  curl_exec($ch) )
{
    echo $result;
}
else
echo "cURL error: ".curl_error($ch);   

curl_close( $ch );

And $result contains:

var exibirImg = new ExibirImagens();
exibirImg.Imagens = [

    new ItemImagem(
        '../fotosanuncios/13886-Papucha 20074.JPG',
        '../fotosanuncios/13886-p-Papucha 20074.JPG'),

    new ItemImagem(
        '../fotosanuncios/13886-Motores Novos.JPG',
        '../fotosanuncios/13886-p-Motores Novos.JPG'),

    new ItemImagem(
        '../fotosanuncios/13886-Panther reformada5.JPG',
        '../fotosanuncios/13886-p-Panther reformada5.JPG'),

    new ItemImagem(
        '../fotosanuncios/13886-Panther reformada 2007.JPG',
        '../fotosanuncios/13886-p-Panther reformada 2007.JPG'),

];
exibirImg.PreLoad();

Upvotes: 1

Related Questions