Ryan H
Ryan H

Reputation: 2953

Approach needed for filtering aggregated data in Laravel / MySQL 8

I'm hoping someone can point me in the right direction here or share some approaches to a challenging feature I need to implement in my Laravel 11 project.

I have a model called StatisticAggregate which contains data columns such as total_submits (e.g: the number of times an application might be submitted). It's aggregated to different bucket/periods, for instance: hourly, daily and weekly.

I then have the following hiearchy models that go in the following order: Company, Buyer then BuyerTier. So a BuyerTier belongs to a Buyer, and a Buyer belongs to a Company.

A company could have 2 buyers, and each buyer could have 2 buyer tiers.

When an application is submitted, it'll create a new StatisticAggregate entry for each period and links to one of those models via a polymorphic relationship:

Schema::create('statistic_aggregates', function (Blueprint $table) {
    $table->ulid('id')->primary();
    $table->tinyInteger('for');
    $table->tinyInteger('product');
    $table->tinyInteger('country');
    $table->unsignedInteger('bucket');
    $table->mediumInteger('period')->default(60);
    $table->morphs('modelable');
    $table->char('key_hash', 32)->virtualAs("
        MD5(CONCAT(
            `for`, '',
            `product`, '',
            `country`, '',
            `bucket`, '',
            `period`, '',
            `modelable_type`, '',
            `modelable_id`, '',
            DATE_FORMAT(`bucket_starts_at`, '%Y-%m-%d %H:%i:%s'), '',
            DATE_FORMAT(`bucket_ends_at`, '%Y-%m-%d %H:%i:%s')
        ))
    ");

    $table->bigInteger('total_submits')->default(0);
    $table->dateTime('bucket_starts_at');
    $table->dateTime('bucket_ends_at');
    $table->timestamps();

    $table->index('bucket_starts_at'); // For date filtering...
    $table->index('bucket_ends_at'); // For date filtering...
    $table->index('created_at'); // For fast sorting...
    $table->index('key_hash'); // For upserts records...
    $table->index('period'); // For period categorisation...

    // For duplicate rows...
    $table->unique([
        'for',
        'product',
        'country',
        'bucket',
        'period',
        'modelable_type',
        'modelable_id',
        'bucket_starts_at',
        'bucket_ends_at',
        'key_hash'
    ], 'unique_statistic_aggregates_index');
});

When a single application is submitted, I'd end up with the following rows in the case of hourly, daily and weekly buckets, let's assume the application goes through 2 buyers, and 2 tiers, then the company would show as 2 submits since it works behind the scenes on a foreach, here's roughly what it would look like:

for: this is the report type enum, product: this is the product type enum, country: this is the report type enum, bucket: unix timestamp, period: number of minutes (daily, hourly etc), modelable: polymorphic model

id for product country bucket period modelable_type modelable_type key_hash total_submits bucket_starts_at bucket_starts_at created_at updated_at
01jggwt9rq2zmsr4vvgwdj9643 1 1 1 1736078400 60 App\Models\Company 1 bf2bc2545a4a5f5683d9ef3ed0d977e0 2 2025-01-05 12:00:00 2025-01-05 13:00:00 2025-01-05 12:00:00 2025-01-05 12:00:00
01jggwt9rq2zmsr4vvgwdj9644 1 1 1 1736035200 1440 App\Models\Company 1 bf2bc2545a4a5f5683d9ef3ed0d977e0 2 2025-01-05 00:00:00 2025-01-05 23:59:59 2025-01-05 00:00:00 2025-01-05 00:00:00
01jggwt9rq2zmsr4vvgwdj9645 1 1 1 1735516800 10080 App\Models\Company 1 bf2bc2545a4a5f5683d9ef3ed0d977e0 2 2024-12-01 00:00:00 2025-01-05 23:59:59 2024-12-01 00:00:00 2024-12-01 00:00:00
01jggwt9rq2zmsr4vvgwdj9646 1 1 1 1736078400 60 App\Models\Buyer 1 bf2bc2545a4a5f5683d9ef3ed0d977e0 1 2025-01-05 12:00:00 2025-01-05 13:00:00 2025-01-05 12:00:00 2025-01-05 12:00:00
01jggwt9rq2zmsr4vvgwdj9647 1 1 1 1736035200 1440 App\Models\Buyer 1 bf2bc2545a4a5f5683d9ef3ed0d977e0 1 2025-01-05 00:00:00 2025-01-05 23:59:59 2025-01-05 00:00:00 2025-01-05 00:00:00
01jggwt9rq2zmsr4vvgwdj9648 1 1 1 1735516800 10080 App\Models\Buyer 1 bf2bc2545a4a5f5683d9ef3ed0d977e0 1 2024-12-01 00:00:00 2025-01-05 23:59:59 2024-12-01 00:00:00 2024-12-01 00:00:00
01jggwt9rq2zmsr4vvgwdj9649 1 1 1 1736078400 60 App\Models\BuyerTier 1 bf2bc2545a4a5f5683d9ef3ed0d977e0 1 2025-01-05 12:00:00 2025-01-05 13:00:00 2025-01-05 12:00:00 2025-01-05 12:00:00
01jggwt9rq2zmsr4vvgwdj9650 1 1 1 1736035200 1440 App\Models\BuyerTier 1 bf2bc2545a4a5f5683d9ef3ed0d977e0 1 2025-01-05 00:00:00 2025-01-05 23:59:59 2025-01-05 00:00:00 2025-01-05 00:00:00
01jggwt9rq2zmsr4vvgwdj9651 1 1 1 1735516800 10080 App\Models\BuyerTier 1 bf2bc2545a4a5f5683d9ef3ed0d977e0 1 2024-12-01 00:00:00 2025-01-05 23:59:59 2024-12-01 00:00:00 2024-12-01 00:00:00
01jggwt9rq2zmsr4vvgwdj9652 1 1 1 1736078400 60 App\Models\Buyer 2 bf2bc2545a4a5f5683d9ef3ed0d977e0 1 2025-01-05 12:00:00 2025-01-05 13:00:00 2025-01-05 12:00:00 2025-01-05 12:00:00
01jggwt9rq2zmsr4vvgwdj9653 1 1 1 1736035200 1440 App\Models\Buyer 2 bf2bc2545a4a5f5683d9ef3ed0d977e0 1 2025-01-05 00:00:00 2025-01-05 23:59:59 2025-01-05 00:00:00 2025-01-05 00:00:00
01jggwt9rq2zmsr4vvgwdj9654 1 1 1 1735516800 10080 App\Models\Buyer 2 bf2bc2545a4a5f5683d9ef3ed0d977e0 1 2024-12-01 00:00:00 2025-01-05 23:59:59 2024-12-01 00:00:00 2024-12-01 00:00:00
01jggwt9rq2zmsr4vvgwdj9655 1 1 1 1736078400 60 App\Models\BuyerTier 2 bf2bc2545a4a5f5683d9ef3ed0d977e0 1 2025-01-05 12:00:00 2025-01-05 13:00:00 2025-01-05 12:00:00 2025-01-05 12:00:00
01jggwt9rq2zmsr4vvgwdj9656 1 1 1 1736035200 1440 App\Models\BuyerTier 2 bf2bc2545a4a5f5683d9ef3ed0d977e0 1 2025-01-05 00:00:00 2025-01-05 23:59:59 2025-01-05 00:00:00 2025-01-05 00:00:00
01jggwt9rq2zmsr4vvgwdj9657 1 1 1 1735516800 10080 App\Models\BuyerTier 2 bf2bc2545a4a5f5683d9ef3ed0d977e0 1 2024-12-01 00:00:00 2025-01-05 23:59:59 2024-12-01 00:00:00 2024-12-01 00:00:00

The problem I've got here, is my front-end shows a report as a nested table with the following nests:

So when I'm filtering for a particular BuyerTier, my main question is how do I subtract, or just calculate that top level Company row?

It holds the number 2, because there was 2 submits, it's aggregated, but when filtering, I don't want to have to manually loop over and re-build the sum of the buyers and populate the company row, is there a way I can store some kind of new model that joins these together effectively to figure out the delta or something? I'm operating a database with essentially around 3 million entries per day, so an aggregated approach works well for the top-level, but when filtering how could my total_submits column know what went in to it?

Individual rows are like their own entity, and don't strictly care about one another here, but functionality from an admin point of view, we know that a buyer belongs to a company. I can't just have foreign id columns like company_id because the application has multi-tenancy type of global logic

Just trying to get some thoughts here, so to summarise:

  1. When there's a model hierarchy, how can I backtrack that top level Company model total_submits number to just show me the result of whatever I'm filtering on? If it has 1,000 submits in that column which would've been programmatically made up from multiple buyers, how would I filter on this?
  2. Aggregated approaches like this works well for high level filtering, I need fast reporting for millions of rows
  3. Maybe there's a way to link things together?

Upvotes: 0

Views: 61

Answers (0)

Related Questions