pexea12
pexea12

Reputation: 1209

Scrape website with javascript using cURL

I try to scrape data of this website: http://ntthnue.edu.vn/tracuudiem

First, when I insert the SBD field with data 'TS4740', I can successfully get the result. However, when I try to run this code:

Here is my PHP cURL code:

<?php

function getData($id) {
    $url = 'http://ntthnue.edu.vn/tracuudiem';
    $ch = curl_init($url);

    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, ['sbd' => $id]);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

    $result = curl_exec($ch);

    curl_close($ch);

    return $result;
}

echo getData('TS4740');

I just got the old page. Can anybody explain why? Thank you!

Upvotes: 3

Views: 3512

Answers (1)

Jocke Med Kniven
Jocke Med Kniven

Reputation: 4049

Make sure you add all the necessary headers and input data. The server that is processing this request can do all kinds of checks to see if it's a "valid" form request. As such you need to spoof the request to be as close to a regular browser request as possible.

Use tools like Chrome Dev Tools to see both the request and respons headers that are sent between the server and your browser to better understand what you curl setup should be like. And further use a app like Postman to make the request simulation super easy and to see what works and not.

Working example:

<?php

function getData($id) {
    $url = 'http://ntthnue.edu.vn/tracuudiem';
    $ch = curl_init($url);
    $postdata = 'namhoc=2015-2016&kythi_name=Tuy%E1%BB%83n+sinh+v%C3%A0o+l%E1%BB%9Bp+10&hoten=&sbd='.$id.'&btnSearch=T%C3%ACm+ki%E1%BA%BFm';
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_HTTPHEADER, array(
        'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Origin: http://ntthnue.edu.vn',
        'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36',
        'Content-Type: application/x-www-form-urlencoded',
        'Referer: http://ntthnue.edu.vn/tracuudiem',
    ));

    $result = curl_exec($ch);

    curl_close($ch);

    return $result;
}

echo getData('TS4740');

Upvotes: 5

Related Questions