Reputation: 1209
I try to scrape data of this website: http://ntthnue.edu.vn/tracuudiem
First, when I insert the SBD field with data 'TS4740', I can successfully get the result. However, when I try to run this code:
Here is my PHP cURL code:
<?php
function getData($id) {
$url = 'http://ntthnue.edu.vn/tracuudiem';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, ['sbd' => $id]);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
echo getData('TS4740');
I just got the old page. Can anybody explain why? Thank you!
Upvotes: 3
Views: 3512
Reputation: 4049
Make sure you add all the necessary headers and input data. The server that is processing this request can do all kinds of checks to see if it's a "valid" form request. As such you need to spoof the request to be as close to a regular browser request as possible.
Use tools like Chrome Dev Tools to see both the request and respons headers that are sent between the server and your browser to better understand what you curl setup should be like. And further use a app like Postman to make the request simulation super easy and to see what works and not.
<?php
function getData($id) {
$url = 'http://ntthnue.edu.vn/tracuudiem';
$ch = curl_init($url);
$postdata = 'namhoc=2015-2016&kythi_name=Tuy%E1%BB%83n+sinh+v%C3%A0o+l%E1%BB%9Bp+10&hoten=&sbd='.$id.'&btnSearch=T%C3%ACm+ki%E1%BA%BFm';
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Origin: http://ntthnue.edu.vn',
'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36',
'Content-Type: application/x-www-form-urlencoded',
'Referer: http://ntthnue.edu.vn/tracuudiem',
));
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
echo getData('TS4740');
Upvotes: 5