kamal jot
kamal jot

Reputation: 29

Download html page with different name

I need help to download the webpages from internet using php script.but right now i have script which is downloading webpage from internet. But it is downloading the webpages with always same name like index.html name.

i want to download the webpage with its own name in url .like aboutus page download with aboutus.html

<!doctype html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>Document</title>
    </head>
    <body>
    <form method="post">
        <input name="url" size="50" placeholder="URL" />
        <input name="submit" type="submit" />
    </form>
    </body>
    </html>
    <?php
    // maximum execution time in seconds
    set_time_limit (24 * 60 * 60);

    if (isset($_POST['submit'])) {

        $url = parse_url($_POST['url']);
        $folder = $url['host'];
        if (array_key_exists('path', $url)) {
            $file = explode('.', str_replace('/', '', $url['path']));
            $file .= '.html';
        } else {
            $file = 'index.html';
        }
        if (!sizeOf(glob($folder))) {
            mkdir($folder);
        }
        file_put_contents($folder . '/' . $file, fopen($_POST['url'], 'r'));
    }
    ?>

Upvotes: 2

Views: 95

Answers (2)

Adolfo Garza
Adolfo Garza

Reputation: 3029

Try this:

<!doctype html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document</title>
</head>
<body>
<form method="post">
    <input name="url" size="50" placeholder="URL" />
    <input name="submit" type="submit" />
</form>
</body>
</html>
<?php
// maximum execution time in seconds
set_time_limit (24 * 60 * 60);

function get_title($url){       
  $str = file_get_contents($url);

  if(strlen($str)>0){
    libxml_use_internal_errors(true);
    $dom = new DOMDocument;
    $dom->loadHTML($str);
    $title = $dom->getElementsByTagName( "title" );
    $titleText = 'index';
    if($title && $title->length){
        $titleText = $title[0]->textContent;
    }

    libxml_use_internal_errors(false);
    return  $titleText;
  }
}

if (isset($_POST['submit'])) {

    $url = parse_url($_POST['url']);
    $folder = $url['host'];
    if (array_key_exists('path', $url)) {
        $file = get_title($_POST['url']);
        $file .= '.html';
    } else {
        $file = 'index.html';
    }
    if (!sizeOf(glob($folder))) {
        mkdir($folder);
    }
    file_put_contents($folder . '/' . $file, fopen($_POST['url'], 'r'));
}
?>

Upvotes: 2

Peyman Mohamadpour
Peyman Mohamadpour

Reputation: 17964

Note

Needs PHP Simple HTML DOM Parser

According to

and on the contrary with answer provided by Adolfo Garza, using regex is not a good idea for HTML, use the DOM Parser instead

<?php
function get_title( $url ){
    $html = new simple_html_dom();
    $html->load_file( $url );
    $title = $html->find( 'title' );
    return $title->plaintext;
}
if( isset( $_POST['submit'] ) ){
    $url = parse_url( $_POST['url'] );
    $folder = $url['host'];
    if( array_key_exists( 'path', $url ) ){
        $file = get_title( $_POST['url'] );
        $file .= '.html';
    }else{
        $file = 'index.html';
    }
    if( !sizeOf( glob( $folder ) ) ){
        mkdir( $folder );
    }
    file_put_contents($folder . '/' . $file, fopen($_POST['url'], 'r'));
}?>
<!doctype html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document</title>
</head>
<body>
<form method="post">
    <input name="url" size="50" placeholder="URL" />
    <input name="submit" type="submit" />
</form>
</body>
</html>

Upvotes: 1

Related Questions