josh
josh

Reputation: 10348

Attempt to get HTML from website using PHP cURL does not work

I am attempting to write a script that can retrieve the HTML from my school's schedule search webpage. I am able to visit the web page normally when I visit it using a browser, but when I try to get it to work using cURL, it gets the HTML from the redirected page. When I changed the

CURLOPT_FOLLOWLOCATION

variable from true to false, it only outputs a blank page with the headers sent.

For reference, my PHP code is

<?php
$curl_connection = curl_init('https://www.registrar.usf.edu/ssearch/');

curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($curl_connection, CURLOPT_HEADER, true);
curl_setopt($curl_connection, CURLOPT_REFERER, "https://www.registrar.usf.edu/");

$result = curl_exec($curl_connection);

print $result;

?>

The website that I am trying to get the HTML of from cURL is https://www.registrar.usf.edu/ssearch/ or https://www.registrar.usf.edu/ssearch/search.php

Any ideas?

Upvotes: 0

Views: 1489

Answers (1)

Kishor
Kishor

Reputation: 1513

I added 2 lines more, which now saves cookies which decides whether to redirect you when you try scraping the shedule's page.

$curl_connection = curl_init();
$url = "https://www.registrar.usf.edu/ssearch/search.php";
curl_setopt($curl_connection, CURLOPT_URL, $url);
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt ($curl_connection, CURLOPT_COOKIEJAR, 'cookie.txt');//cookiejar to dump cookie infos.
curl_setopt ($curl_connection, CURLOPT_COOKIEFILE, 'cookie.txt');//cookie file for further reference from the site
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl_connection, CURLOPT_HEADER, true);
curl_setopt($curl_connection, CURLOPT_REFERER, "https://www.registrar.usf.edu/");
$result = curl_exec($curl_connection);
echo $result;

Also, I havent seen anyone putting urls in curl_init yet.

Here is the cookie :

# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.

www.registrar.usf.edu   FALSE   /   FALSE   0   PHPSESSID   eied78t0v1qlqcop0rdk214361
www.registrar.usf.edu   FALSE   /ssearch/   FALSE   1336718465  cookie_test cookie_set

If you ever wanna debug a non working curl stuff, start with var_dump(curl_getinfo($curl_connection)); and next one to check is curl_error($curl_connection);

Upvotes: 3

Related Questions