Sathishkumar Manoharan
Sathishkumar Manoharan

Reputation: 301

How can fetching huge records using Laravel and MySQL?

I Need experts Suggestions and Solutions. We are developing job portal website here by handle around 1 million records. We are facing records fetching timeout errors. How can I handle those records using laravel and MySql?

We are trying to follow steps:

  1. Increase the PHP execution time
  2. MySql Indexing
  3. Paginations

Upvotes: 6

Views: 26456

Answers (5)

Md. Saifur Rahman
Md. Saifur Rahman

Reputation: 404

Challenge:

Downloading a CSV File with a Size of Over 30 Million Records

Solution:

Implementing Laravel Job Chains to Manage the Processing of 30 Million Records. Storing the Data in 30 CSV Files (1M in each CSV) and Downloading them as a Single Zip File.

IN Controller

public function trans_recharge_rep_dl_custm_csv_chunk_queue(Request $request)
    {
        $chunkSize = 25000;  // 25k data per chunk for safety but server capacity up to 70k for 134mb of running processes
        $recharge_archives = TransactionReportFacade::trans_recharge_rep_process($request);
        $totalCount = $recharge_archives->count();
        $numberOfChunks = ceil($totalCount / $chunkSize);
        $fileName = "file_".Str::slug(getNow()."-".rand(1,999)).".csv";
        $report_type = $request->report_type=='recharge_report' ? 'recharge' : 'transactions';
        $file_base_url = "/uploads/reports/".$report_type."/".getToday()."/csv";
        $file_full_base_url = $_SERVER['DOCUMENT_ROOT']."/uploads/reports/".$report_type."/".getToday()."/csv";
        $file_url = $file_base_url."/".$fileName;
        $file_full_url = $file_full_base_url."/".$fileName;
        $report_type_name = ($request->report_type=='recharge_report' ? 'Recharge Report':'Transaction Report');

        $data = $totalCount>1000000 ?
                    TransactionReportFacade::transaction_rep_download_more_than_1m($request, $totalCount, $chunkSize, $numberOfChunks, $fileName, $file_base_url, $file_full_base_url, $file_url, $file_full_url, $report_type_name)
                    :
                    TransactionReportFacade::transaction_rep_download_less_or_eq_1m($request, $totalCount, $chunkSize, $numberOfChunks, $fileName, $file_base_url, $file_full_base_url, $file_url, $file_full_url, $report_type_name);

        return $this->set_response($data, 200,'success', ['Transaction report file is being generated. Please allow us some time. You may download the csv file in notification section.'], $request->merge(['log_type_id' => 3]));
    }

In Facade / Helper

<?php

namespace App\Facades\Modules\Report\TransactionReport;

use ZipArchive;
use App\Models\RechargeArchives;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Bus;
use App\Jobs\TransactionReportCSVChunk;
use Illuminate\Support\Facades\File;


class TransactionReportHelper
{
    public function trans_recharge_rep_process($request)
    {    
        $recharge_daily_tbl = RechargeArchives::with('reduction_list:id,ruid');

        if ($request->report_type == 'recharge_report') {
            if (isset($request->request_date_from)) {
                $recharge_daily_tbl = $recharge_daily_tbl->where('recharge_archives.recharge_date', '>=', $request->request_date_from);
            }

            if (isset($request->request_date_to)) {
                $recharge_daily_tbl = $recharge_daily_tbl->where('recharge_archives.recharge_date', '<=', $request->request_date_to);
            }
        } else {
            if (isset($request->request_date_from)) {
                $recharge_daily_tbl = $recharge_daily_tbl->where('recharge_archives.request_date', '>=', $request->request_date_from);
            }

            if (isset($request->request_date_to)) {
                $recharge_daily_tbl = $recharge_daily_tbl->where('recharge_archives.request_date', '<=', $request->request_date_to);
            }
        }

        return $recharge_daily_tbl;
    }

    // transaction report less than or equal to 1 Millon
    public function transaction_rep_download_less_or_eq_1m($request, $totalCount, $chunkSize, $numberOfChunks, $fileName, $file_base_url, $file_full_base_url, $file_url, $file_full_url, $report_type_name)
    {
        $batches  = [];
        for($i=1; $i<=$numberOfChunks; $i++)
        {
            $offset = ($i * $chunkSize) - $chunkSize;
            $batches[] = new TransactionReportCSVChunk(
                [
                    'request' => $request->all(),
                    'offset' => $offset, 'totalCount' => $totalCount, 'chunkSize' => $chunkSize, 'numberOfChunks' => $numberOfChunks, 'fileName' => $fileName, 'file_base_url' => $file_base_url, 'file_full_base_url' => $file_full_base_url,  'file_url' => $file_url, 'file_full_url' => $file_full_url, 'user_id'  =>  auth()->user()->id, 'report_type_name' => $report_type_name
                ]
            );
        }

        Bus::chain($batches)->dispatch();
        $data = [
            'download' => $file_url
        ];
        return $data;
    }

    // transaction report more than or equal to 1 Millon
    public function transaction_rep_download_more_than_1m($request, $totalCount, $chunkSize, $numberOfChunks, $fileName, $file_base_url, $file_full_base_url, $file_url, $file_full_url, $report_type_name)
    {
        // $fileName = is master filename that will be renamed as multiple file names
        $files_count = (int) ceil($totalCount/1000000);
        $files = [];  // each file contains 1m data

        $zip_file_name = substr($fileName, 0, -3).'zip';
        $zip_file_url = substr($file_url, 0, -3).'zip';
        $zip_file_full_base_url = $file_full_base_url;
        $zip_file_full_url = substr($file_full_url, 0, -3).'zip';

        for($i = 0; $i < $files_count; $i++)
        {
            $files[] = $i.'-'.$fileName;
        }

        $batches  = [];
        $offset=1;

        for($i=$offset; $i<=$numberOfChunks; $i++)
        {
            $offset = ($i * $chunkSize) - $chunkSize;

            $fileName = $files[$offset/1000000]; // file name overwritten for each 1m datasets
            $file_url = $file_base_url.'/'.$fileName; // file full base url overwritten for each 1m datasets
            $file_full_url = $file_full_base_url.'/'.$fileName; // file full base url overwritten for each 1m datasets

            $batches[] = new TransactionReportCSVChunk(
                [
                    'request' => $request->all(),
                    'offset' => $offset, 'totalCount' => $totalCount, 'chunkSize' => $chunkSize, 'numberOfChunks' => $numberOfChunks, 'fileName' => $fileName, 'file_base_url' => $file_base_url, 'files_count' => $files_count, 'file_full_base_url' => $file_full_base_url,  'file_url' => $file_url, 'file_full_url' => $file_full_url, 'user_id'  =>  auth()->user()->id, 'report_type_name' => $report_type_name, 'files' => $files, 'zip_file_name' => $zip_file_name, 'zip_file_url' => $zip_file_url, 'zip_file_full_base_url' => $zip_file_full_base_url, 'zip_file_full_url' => $zip_file_full_url
                ]
            );
        }
        Bus::chain($batches)->dispatch();
        $data = [
            'download' => $zip_file_url
        ];
        return $data;
    }

}

In Job Queue

<?php

namespace App\Jobs;

use ZipArchive;
use Illuminate\Bus\Batchable;
use Illuminate\Bus\Queueable;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\File;
use Illuminate\Queue\SerializesModels;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use App\Facades\Modules\Report\TransactionReport\TransactionReportFacade;

class TransactionReportCSVChunk implements ShouldQueue
{
    use Batchable, Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    protected $data;

    public function __construct($data = [])
    {
        $this->data = $data;
    }

    public function handle()
    {
        $zip = new ZipArchive();

        $process_start_time = getTodayDateTime();

        $request = (object) $this->data['request'];
        $offset = $this->data['offset'] ;
        $totalCount = $this->data['totalCount'] ;
        $chunkSize = $this->data['chunkSize'] ;
        $numberOfChunks = $this->data['numberOfChunks'] ;
        $fileName = $this->data['fileName'] ;
        $file_base_url = $this->data['file_base_url'] ;
        $file_full_base_url = $this->data['file_full_base_url'] ;
        $file_url = $this->data['file_url'] ;
        $file_full_url = $this->data['file_full_url'] ;
        $user_id = $this->data['user_id'] ;
        $report_type_name = $this->data['report_type_name'] ;
        $files = $this->data['files'] ?? [] ;
        $zip_file_name = $this->data['zip_file_name'] ?? '' ;
        $zip_file_url = $this->data['zip_file_url'] ?? '' ;
        $zip_file_full_base_url = $this->data['zip_file_full_base_url'] ?? '' ;
        $zip_file_full_url = $this->data['zip_file_full_url'] ?? '' ;

        $recharge_archives = TransactionReportFacade::trans_recharge_rep_process($request);


        writeToLog('======Queue started========'.'$offset='.$offset, 'debug');

        try {
            File::ensureDirectoryExists($file_full_base_url);

            // Open/Create the file
            $fp = fopen($file_full_url, 'a');

            if($offset==0)
            {
                $header = [
                    "Serial",
                    "Operator" ,
                    "Client" ,
                    ($request->report_type=='recharge_report' ? 'Recharge Date':'Request Date') ,
                    "Sender" ,
                    "Recipient" ,
                    "Amount" ,
                    "Status" ,
                    "Connection Type" ,
                ];
                // Write to the csv
                fputcsv($fp, $header);
            }


            $operator_info = DB::table('operator_info')->select('operator_id', 'operator_name')->get();
            $client_info = DB::table('client_info')->select('client_id', 'client_name')->get();
            $recharge_status_codes = DB::table('recharge_status_codes')->select('status_code', 'status_name')->get();

            $cursor = $recharge_archives
                        ->skip($offset)
                        ->take($chunkSize)
                        ->cursor();

            foreach ($cursor as $item)
            {
                $item_data = [
                                "Serial" => $offset+1,  //  First chunk 0, then 10000, 20000
                                "Operator" => $operator_info->where('operator_id', $item->operator_id)->pluck('operator_name')->first() ,
                                "Client" => $client_info->where('client_id', $item->client_id)->pluck('client_name')->first(),
                                ($request->report_type=='recharge_report' ? 'Recharge Date':'Request Date') => ($request->report_type=='recharge_report' ? $item->recharge_date:$item->request_date),
                                "Sender" => $item->request_sender_id,
                                "Recipient" => $item->recharge_recipient_msisdn,
                                "Amount" => $item->amount,
                                "Status" => $recharge_status_codes->where('status_code', $item->status_code)->pluck('status_name')->first(),
                                "Connection Type" => $item->connection_type,
                            ];
                fputcsv($fp, $item_data);
                $offset = $offset+1;
            }

            $process_end_time = getTodayDateTime();

            $user = DB::table('users_report')->where('id', $user_id)->first();

            // Close the file
            fclose($fp);


            // When all jobs/chunks have finished
            if ($offset>=$totalCount)
            {
                $file_type = 'text/csv'; // default csv format
                if($totalCount>1000000) // zip proces will only applicable for 1M+ dataset
                {
                    $file_url = $zip_file_url; // .csv to .zip file format
                    $file_type = 'application/zip'; // .csv to .zip file format for more than 1m data
                    $zip = new ZipArchive();

                    writeToLog('$zip_file_full_url = '.$zip_file_full_url, 'debug');
                    $zip->open($zip_file_full_url, ZipArchive::CREATE);

                    foreach ($files as $key => $csvfilename)
                    {
                        // Add the CSV file to the ZIP archive
                        writeToLog('$file_full_base_url $csvfilename '.$file_full_base_url.'/'. $csvfilename, 'debug');
                        $zip->addFile($file_full_base_url.'/'. $csvfilename, $csvfilename);
                    }
                    $zip->close();
                }
                reportlogsGenerate($fileName, $file_url, $user_id);
                $notification_response = [
                    'is_report' => 1 ,
                    'report_name' => $report_type_name,
                    'process_start_time' => $process_start_time,
                    'process_end_time' => $process_end_time,
                    'execution_time' => getTimeDuration($process_start_time, $process_end_time),
                    'no_of_records' => $offset,
                    'filePath' => $file_url,
                    "type" => $file_type,
                ];
                notificationsGenerate($notification_response, $user_id, $report_type_name.' is generated.', 'text/csv');
            }

            writeToLog('======Queue ended========'.'$offset='.$offset, 'debug');
        } catch (\Throwable $e) {
            report($e);
            return false;
        }
    }
}

Upvotes: 1

Farid Movsumov
Farid Movsumov

Reputation: 12725

Laravel has a lazy feature for this purpose. I tried both chunk and cursor. The cursor makes one query and puts a lot of data in the memory which is not useful if you have millions of records in DB. Chunk also was ok but lazy much cleaner in the way you write your code.

use App\Models\Flight;
 
foreach (Flight::lazy() as $flight) {
    //
}

Source: https://laravel.com/docs/9.x/eloquent#chunking-results

Upvotes: 3

Kabubless
Kabubless

Reputation: 121

Hi I think this might help

    $users = User::groupBy('id')->orderBy('id', 'asc');

    $response = new StreamedResponse(function() use($users){
        $handle = fopen('php://output', 'w');
        // Add Excel headers
        fputcsv($handle, [
            'col1', 'Col 2'           ]);
        $users->chunk(1000, function($filtered_users) use($handle) {
            foreach ($filtered_users as $user) {
                // Add a new row with user data
                fputcsv($handle, [
                    $user->col1, $user->col2
                ]);
            }
        });
        // Close the output stream
        fclose($handle);
        }, 200, [
            'Content-Type' => 'text/csv',
            'Content-Disposition' => 'attachment; filename="Users'.Carbon::now()->toDateTimeString().'.csv"',
        ]);

        return $response;

Upvotes: 3

user320487
user320487

Reputation:

You should be chunking results when working with large data sets. This allows you to process smaller loads, reduces memory consumption and allows you to return data to the User while the rest is being fetched/processing. See the laravel documentation on chunking:

https://laravel.com/docs/5.5/eloquent#chunking-results

To further speed things up you can leverage multithreading and spawn concurrent processes that each handle a chunk at a time. Symfony's Symfony\Component\Process\Process class makes this easy to do.

https://symfony.com/doc/current/components/process.html

Upvotes: 13

Alexey Mezenin
Alexey Mezenin

Reputation: 163788

From the docs:

If you need to work with thousands of database records, consider using the chunk method. This method retrieves a small chunk of the results at a time and feeds each chunk into a Closure for processing. This method is very useful for writing Artisan commands that process thousands of records. For example, let's work with the entire users table in chunks of 100 records at a time:

DB::table('users')->orderBy('id')->chunk(100, function ($users) {
    foreach ($users as $user) {
        //
    }
});

Upvotes: 6

Related Questions