Scrobot
Scrobot

Reputation: 1981

How to optimise handle of big data on laravel?

My task is: "To take transactions table, grouped row by transaction date and calculate statuses. This manipulations will be formed statistics, wich will be rendered on the page".

This is my method of this statistics generation

public static function getStatistics(Website $website = null)
{
    if($website == null) return [];

    $query = \DB::table('transactions')->where("website_id", $website->id)->orderBy("dt", "desc")->get();

    $transitions = collect(static::convertDate($query))->groupBy("dt");
    $statistics = collect();

    dd($transitions);

    foreach ($transitions as $date => $trans) {
        $subscriptions = $trans->where("status", 'subscribe')->count();
        $unsubscriptions = $trans->where("status", 'unsubscribe')->count();
        $prolongations = $trans->where("status", 'rebilling')->count();
        $redirections = $trans->where("status", 'redirect_to_lp')->count();
        $conversion = $redirections == 0 ? 0 : ((float) ($subscriptions / $redirections));
        $earnings = $trans->sum("pay");

        $statistics->push((object)[
            "date" => $date,
            "subscriptions" => $subscriptions,
            'unsubscriptions' => $unsubscriptions,
            'prolongations' => $prolongations,
            'redirections' => $redirections,
            'conversion' => round($conversion, 2),
            'earnings' => $earnings,
        ]);

    }

    return $statistics;
}

if count of transaction rows below 100,000 - it's all wright. But, if count is above 150-200k - nginx throw 502 bad gateway. What can you advise to me? I'm don't have any expierince in bigdata handling. May be, my impiments has fundamental error?

Upvotes: 1

Views: 12042

Answers (3)

Igor
Igor

Reputation: 1111

The best way to handle big/huge database data/array with Laravel is to use LazyCollection (docs) with chunks. Also discussed here

<?php
$chunkSize = 500;
MyModel::lazy($chunkSize)->each(function (LazyCollection $items) {
    $items->each(function(MyModel $item) {
        // $item handling
    });
});
// or collect($bigDataArray)->lazy()->chunk($chunkSize)->each(function(LazyCollection $items) {...})

Upvotes: 0

Scrobot
Scrobot

Reputation: 1981

After several days of researching information on this question, I found the right answer:

NOT to use PHP for handling raw data. It's better to use SQL!

In my case, we are using PostgreSQL.

Below, i'll write sql-query which worked for me, maybe it will help someone else.

WITH
        cte_range(dt) AS
        (
            SELECT
                generate_series('2016-04-01 00:00:00'::timestamp with time zone, '{$date} 00:00:00'::timestamp with time zone, INTERVAL '1 day')
        ),

        cte_data AS
        (
            SELECT
                date_trunc('day', dt) AS dt,
                COUNT(*) FILTER (WHERE status = 'subscribe') AS count_subscribes,
                COUNT(*) FILTER (WHERE status = 'unsubscribe') AS count_unsubscribes,
                COUNT(*) FILTER (WHERE status = 'rebilling') AS count_rebillings,
                COUNT(*) FILTER (WHERE status = 'redirect_to_lp') AS count_redirects_to_lp,
                SUM(pay) AS earnings,
                CASE
                    WHEN COUNT(*) FILTER (WHERE status = 'redirect_to_lp') > 0 THEN 100.0 * COUNT(*) FILTER (WHERE status = 'subscribe')::float / COUNT(*) FILTER (WHERE status = 'redirect_to_lp')::float
                    ELSE 0
                END
                AS conversion_percent

            FROM
                transactions

            WHERE
                website_id = {$website->id}

            GROUP BY
                date_trunc('day', dt)
        )

        SELECT
            to_char(cte_range.dt, 'YYYY-MM-DD') AS day,
            COALESCE(cte_data.count_subscribes, 0) AS count_subscribe,
            COALESCE(cte_data.count_unsubscribes, 0) AS count_unsubscribes,
            COALESCE(cte_data.count_rebillings, 0) AS count_rebillings,
            COALESCE(cte_data.count_redirects_to_lp, 0) AS count_redirects_to_lp,
            COALESCE(cte_data.conversion_percent, 0) AS conversion_percent,
            COALESCE(cte_data.earnings, 0) AS earnings

        FROM
            cte_range

        LEFT JOIN
            cte_data
            ON cte_data.dt = cte_range.dt

        ORDER BY
            cte_range.dt DESC

Upvotes: 4

Josh
Josh

Reputation: 3288

Big data is never easy, but I would suggest using the Laravel chunk instead of get.

https://laravel.com/docs/5.1/eloquent (ctrl+f "::chunk")

What ::chunk does is select n rows at a time, and allow you to process them bit by bit. This is convenient in that it allows you to stream updates to the browser, but at the ~150k result range, I would suggest looking up how to push this work into a background process instead of handling it on request.

Upvotes: 4

Related Questions