Čamo
Čamo

Reputation: 4182

PHP CSV parser for 15000 rows optimization

I need to parse some CSV data and store it to the database. The input file is about 15000 rows. It is written in Laravel framewrok. The request to DB spend about 0.2s so it seems the problem is in CSV parser. Can somebody tell me how to optimize this code in PHP. The code looks like this:

protected function csv2array(string $csv)
{
    try {
        $return = [
            'headers' => [],
            'rows' => [],
        ];
        $rows = explode(PHP_EOL, $csv);

        $headers = str_getcsv(strtolower(array_shift($rows)));  // Headers + strtolower()
        $return['headers'] = $headers;

        foreach ($rows as $row) {
            $items = str_getcsv($row);
            if ( count($items) !== count($headers) ) continue;

            $items[2] = new Carbon($items[2]);  // Third item is UTC datetime
            $items[3] = new Carbon($items[3]);  // Fourth item is UTC datetime

            $items = array_combine($headers, $items);

            $return['rows'][] = $items;
        }

        return $return;
    } catch (Exception $e) {
        Log::error($e->getMessage());
        throw $e;
    }
}

The parent code which call csv2array() looks like this

$csv = $request->getContent();
$csv = trim($csv);
$csvArray = $this->csv2array($csv);

$insertArray = $this->addDeviceIdToArray($csvArray['rows'], $device);  // This is fast 0.2s

Upvotes: -2

Views: 190

Answers (1)

Martijn
Martijn

Reputation: 16123

My guess would be that your current setup load everything in one go. You have too many rows for that to be an effective route. You have the 'easy fast' solution, those have limitations.

IMO the most likely issue is the amount of data. You load 15000 lines into $csv and then copy that into a huge array. PHP isnt really that good with info like that.

My suggestion is to use fgetcsv:

file_put_contents("test.csv", $request->getContent()); // write to file
protected function csvToArray(){
    $handle = fopen("test.csv", "r");
    $i = 0;
    while (($items = fgetcsv($handle, null, ",")) !== false) {
        $i++;
        if($i === 1){
            $headers = $items;
            continue;
        }

        $items[2] = new Carbon($items[2]);  // Third item is UTC datetime
        $items[3] = new Carbon($items[3]);  // Fourth item is UTC datetime

        $items = array_combine($headers, $items);
        $return['rows'][] = $items;
    }

    return $return;
}

file_put_contents("test.csv", $request->getContent()); // write to file

$csvArray = $this->csvToArray();

That way you can stream from a file which is a lot less memory intensive and doesnt scal with filesize.
(Please note that this is demo code, there is room for improvement)

Upvotes: 0

Related Questions