user2902000
user2902000

Reputation: 11

Parse a website, getting all the links and save into mysql database

I'm working on PHP and MySQL along with PHP Simple HTML DOM Parser. I have to parse a website's pages and fetch some content. For that I put the homepage of website as an initial url and fetched all the anchor tags available on that page.

I have to filter those urls as every link is not useful for me. So, I used regular expression. Required links must be saved into my mysql database.

My questions are:

  1. If I extract all the links(around 1,20,000 links) and try to save into mysql DB, I'm getting the following error: Fatal error: Maximum execution time of 60 seconds exceeded in C:\xampp\htdocs\search-engine\index.php on line 12

  2. I can't store data into database.

  3. I couldn't filter links.

    include('mysql_connection.php');
    include('simplehtmldom_1_5/simple_html_dom.php');
    $website_name="xyz.html/";
    
    $html=file_get_html("xyz.html/");
    foreach($html->find('div') as $div)
    {
        foreach($html->find('a') as $a_burrp)
        { 
        echo $a1 = $a_burrp->href . '<br>';
            if(preg_match('/.+?event.+/',$a1, $a_match))
            {
                mysql_query("INSERT INTO scrap_urls(url, website_name, date_added) VALUES(`$a1`, `$website_name`, now())";
            }
    
        }
    }
    

Upvotes: 0

Views: 1483

Answers (3)

David P. P.
David P. P.

Reputation: 610

I also/ usually work with php scripts that need "some time" to finish.

I always run those scripts either as a cronjob or directly from shell or command line using:

php script.php parameters

Though I don't have to mind the execution. There is a purpose that php_execution_time is usually set to <=60secs.

Regards.

Upvotes: 0

Roninio
Roninio

Reputation: 1771

You are receiving Fatal error: Maximum execution time of 60 seconds because of a config limitation in PHP. You can enlarge this number by adding a line like this at the top of your code:

set_time_limit(320);

More info: http://www.php.net/manual/en/function.set-time-limit.php

You can also just enlarge the number in your php.ini file in xampp

Upvotes: 2

wachme
wachme

Reputation: 2337

Actually, PHP is not the best solution. PHP script is intended to perform quick operations and return response. In your case the script can possibly run for a quite long time. Although you are able to increase max_execution_time, I encourage you to use another technology that is much more flexible than standard PHP, such as Python or JavaScript (Node.js)

Upvotes: 1

Related Questions