lxndr
lxndr

Reputation: 703

How to parse Large CSV file without timing out?

I'm trying to parse a 50 megabyte .csv file. The file itself is fine, but I'm trying to get past the massive timeout issues involved. Every is set upload wise, I can easily upload and re-open the file but after the browser timeout, I receive a 500 Internal error.

My guess is I can save the file onto the server, open it and keep a session value of what line I dealt with. After a certain line I reset the connect via refresh and open the file at the line I left off with. Is this a do-able idea? The previous developer made a very inefficient MySQL class and it controls the entire site, so I don't want to write my own class if I don't have to, and I don't want to mess with his class.

TL;DR version: Is it efficient to save the last line I'm currently on of a CSV file that has 38K lines of products then, and after X number of rows, reset the connection and start from where I left off? Or is there another way to parse a Large CSV file without timeouts?

NOTE: It's the PHP script execution time. Currently at 38K lines, it takes about 46 minutes and 5 seconds to run via command line. It works correctly 100% of the time when I remove it from the browser, suggesting that it is a browser timeout. Chrome's timeout is not editable as far as Google has told me, and Firefox's timeout works rarely.

Upvotes: 3

Views: 6158

Answers (3)

ehmad11
ehmad11

Reputation: 1395

I suggest to run php from command line and set it as a cron job. This way you don't have to modify your code. There will be no timeout issue and you can easily parse large CSV files.

also check this link

Upvotes: 2

Knarf
Knarf

Reputation: 1273

You could do something like this:

<?php

namespace database;

class importcsv
{
    private $crud;

    public function __construct($dbh, $table)
    {
        $this->crud = new \database\crud($dbh, $table);
        return $this;
    }

    public function import($columnNames, $csv, $seperator)
    {
        $lines = explode("\n", $csv);

        foreach($lines as $line)
        {
            \set_time_limit(30);
            $line = explode($seperator, $line);

            $data = new \stdClass();

            foreach($line as $i => $item) 
            {
                if(isset($columnNames[$i])&&!empty($columnNames[$i]))
                    $data->$columnNames[$i] = $item;
            }

            @$x++;

            $this->crud->create($data);
        }

        return $x;
    }

    public function importFile($columnNames, $csvPath, $seperator)
    {
        if(file_exists($csvPath))
        {
            $content = file_get_contents($csvPath);
            return $this->import($columnNames, $content, $seperator);
        }
        else
        {
            // Error
        }
    }
}

TL;DR: \set_time_limit(30); everytime you loop throu a line might fix your timeout issues.

Upvotes: 3

srini.venigalla
srini.venigalla

Reputation: 5145

Your post is a little unclear due to the typos and grammar, could you please edit?

If you are saying that the Upload itself is okay, but the delay is in processing of the file, then the easiest thing to do is to parse the file in parallel using multiple threads. You can use the java built-in Executor class, or Quartz or Jetlang to do this.

  • Find the size of the file or number of lines.
  • Select a Thread load (Say 1000 lines per thread)
  • Start an Executor
  • Read the file in a loop.
  • For ach 1000 lines, create a Runnable and load it to the Executor
  • Start the Executor
  • Wait till all threads are finished

Each runnable does this:

  1. Fetch a connection
  2. Insert the 1000 lines
  3. Log the results
  4. Close the connection

Upvotes: 2

Related Questions