B.Koc
B.Koc

Reputation: 15

How to scrape data from asmx web service generated page

I have been searching on the net but found nothing useful. I need to update my product prices from the supplier website automatically. I wanted to scrape information from category page for all product at once.

I used simple html dom method to get data. When I used tags to retrieve prices wihch I got from firefox firebug extension, it printed nothing. I tried to print all links in that category page and no product link in them. When I looked at the source code of the site with right click on page, I saw no code related to products. the div is empty like;

<div class=coll-2 fleft> </div>

But it was full of code in firebug extension. Then I saw that a js file have this codes;

function GetProductListHeader() {
var startPage = GetStartPage();
if (pageName == 'kategori' || pageName == 'reyon') {
    var BrandList = GetQueryStringByName("Brand");
    var ColorList = GetQueryStringByName("Color");
    var PropList = GetQueryStringByName("propid");
    var ItemDim1CodeList = GetQueryStringByName("vcode");
    var QPrice = GetQueryStringByName("price");
    var cFilter = GetQueryStringByName("cfilter");

    var parametre = { PageName: pageName, pUrl: PageUrl, BrandList: BrandList, ColorList: ColorList, ItemDim1CodeList: ItemDim1CodeList, PropList: PropList, QPrice: QPrice, cFilter: cFilter, startPage: startPage };
    $.ajax(
        {
            url: '/WS/wsProduct.asmx/GetProductListHeader',
            type: 'POST',
            processData: false,
            contentType: 'application/json; charset=utf-8',
            data: JSON.stringify(parametre),
            dataType: 'json',
            async: true
        })
        .done(function (e) {
            if (e.d != "") {  
                $('.coll-2').html(e.d);
                GetProductList(startPage);
            }
        })
}
}

Is there any way to get this datas with php?

Thank you.

Edit: I tried to setup the curl code after getting it from chrome network, I used below script;

$html = 'curl "http://bebekbayi.com/WS/wsProduct.asmx/GetProductList" \ 
    -H "Cookie: ASP.NET_SessionId=wy5hyt1bujcrdka2hpbp2wnm; _gat=1; _ga=GA1.2.1204447549.1447830812" \ 
    -H "Origin: http://bebekbayi.com" \ 
    -H "Accept-Encoding: gzip, deflate" \ 
    -H "Accept-Language: tr-TR,tr;q=0.8,en-US;q=0.6,en;q=0.4" \ 
    -H "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36" \ 
    -H "Content-Type: application/json; charset=UTF-8" \ 
    -H "Accept: application/json, text/javascript, */*; q=0.01" \ 
    -H "Cache-Control: max-age=0" \ 
    -H "X-Requested-With: XMLHttpRequest" \ 
    -H "Connection: keep-alive" \ 
    -H "Referer: http://bebekbayi.com/kategori/bakim-cantalari" 
    --data-binary "{""PageName"":""kategori"",""pUrl"":""bakim-cantalari"",""pIndex"":1,""BrandList"":"""",""ColorList"":"""",""ItemDim1CodeList"":"""",""PropList"":"""",""QPrice"":"""",""cFilter"":""""}" --compressed';
exec($html,$result);
   foreach($result as $res){

       echo $res . '<br>'; 
   }

It returned; [InvalidOperationException: Request format is unrecognized for URL unexpectedly ending in '/GetProductList'.]

Upvotes: 0

Views: 347

Answers (1)

NIlay Mehta
NIlay Mehta

Reputation: 476

I think your task is now getting easier that you directly get the data source.

What you can do is you can get the full URL of the webservice and make PHP CURL Call.

So you will get the response, Generally it will be in the XML but it will depend on how this webservice is written.

here is the code.

$html = "curl 'http://bebekbayi.com/WS/wsProduct.asmx/GetProductList' -H 'Origin: http://bebekbayi.com' -H 'Accept-Encoding: gzip, deflate' -H 'Accept-Language: en-US,en;q=0.8' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36' -H 'Content-Type: application/json; charset=UTF-8' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Referer: http://bebekbayi.com/reyon/Anne' -H 'X-Requested-With: XMLHttpRequest' -H 'Connection: keep-alive' --data-binary '{\"PageName\":\"reyon\",\"pUrl\":\"Anne\",\"pIndex\":1,\"BrandList\":\"\",\"ColorList\":\"\",\"ItemDim1CodeList\":\"\",\"PropList\":\"\",\"QPrice\":\"\",\"cFilter\":\"\"}' --compressed";
exec($html,$result);
$obj =  json_decode(implode("",$result) , true);
print_R($obj);exit;
exit;

Upvotes: 2

Related Questions