Ya Basha
Ya Basha

Reputation: 1952

Laravel API with large datasets very slow in loading

I have developed a Laravel application with api routes that load the data from mysql db, the controller action executes a nested DB query and returns a collection; by default Laravel converts returned collection to JSON, below is my contrller action:

public function index()
    {
        $admin0 = 2;

        return DB::table('commodities_monthly_prices as CMP')
            ->join('locations', 'CMP.location_id', '=', 'locations.id')
            ->where('locations.admin0', $admin0)
            ->join('global_admins', 'locations.admin0', '=', 'global_admins.admin0_code')
            ->join('commodities', 'CMP.commodity_id', '=', 'commodities.id')
            ->join('price_types', 'CMP.price_type_id', '=', 'price_types.id')
            ->join('measure_units', 'CMP.measure_unit_id', '=', 'measure_units.id')
            ->select('commodities.name AS commodity_name', 'price_types.name AS type_name',
                'measure_units.name AS unit_name', 'global_admins.admin0_code AS admin0',
                'global_admins.admin0_name AS country_name', 'global_admins.admin1_code AS admin1',
                'global_admins.admin1_name AS governorate_name')
            ->selectRaw('date(concat_WS(\'-\', CMP.price_year, CMP.price_month, \'01\')) as date, ROUND(AVG(CMP.price) / 1000, 3) as price')
            ->groupBy('date', 'commodity_name', 'type_name', 'unit_name', 'global_admins.admin0_name',
                'global_admins.admin0_code', 'global_admins.admin1_name', 'global_admins.admin1_code')
            ->orderBy('date')
            ->orderBy('commodity_name')
            ->get();
    }

Number of returned JSON objects more than 50000 objects and take about 90 seconds to finish loading, the only thing came to my mind is to change the response from JSON to CSV (will this help) as the front-end library accepts JSON and CSV http response, if this helps please tell me how to do it? if I need a third party library?

Please advie me how I can make the action loads faster.

Thank you for your help,

SELECT `commodities`.`name` AS `commodity_name`,
       `price_types`.`name` AS `type_name`,
       `measure_units`.`name` AS `unit_name`,
       `global_admins`.`admin0_code` AS `admin0`,
       `global_admins`.`admin0_name` AS `country_name`,
       `global_admins`.`admin1_code` AS `admin1`,
       `global_admins`.`admin1_name` AS `governorate_name`,
       ROUND(AVG(CMP.price) / 1000, 3) AS price
FROM `commodities_monthly_prices` AS `CMP`
INNER JOIN `locations` ON `CMP`.`location_id` = `locations`.`id`
INNER JOIN `global_admins` ON `locations`.`admin0` = `global_admins`.`admin0_code`
INNER JOIN `commodities` ON `CMP`.`commodity_id` = `commodities`.`id`
INNER JOIN `price_types` ON `CMP`.`price_type_id` = `price_types`.`id`
INNER JOIN `measure_units` ON `CMP`.`measure_unit_id` = `measure_units`.`id`
WHERE `locations`.`admin0` = ?
GROUP BY `CMP`.`price_year`,
         `CMP`.`price_month`,
         `commodity_name`,
         `type_name`,
         `unit_name`,
         `global_admins`.`admin0_name`,
         `global_admins`.`admin0_code`,
         `global_admins`.`admin1_name`,
         `global_admins`.`admin1_code`
ORDER BY `commodity_name` ASC

Upvotes: 0

Views: 637

Answers (1)

Rick James
Rick James

Reputation: 142218

For starters, you need INDEX(admin0) on Locations.

You don't need a datatype that goes into the billions for this: price_month int(10) unsigned DEFAULT NULL. INT takes 4 bytes; TINYINT takes only 1. And many of the other columns are bigger than necessary. This has an impact on disk space and query speed.

If this is a "year", I hope you are not comparing it to an INT: str_year0 varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,

commodities_monthly_prices has both prices and a description of the commodities (market, currency, etc). The description is probably constant, so probably should be in a separate table.

Your Question is about the speed of loading. Is it coming from a CSV file via LOAD DATA? If so, I suggest you load it into a table that is tailored to what the incoming data looks like. Then massage the data -- normalization, cleanse, fix typos, etc. Finally copy the data into the tables you want for analysis. These tables are not necessarily in the same form as the raw data. This would, for example give you a chance to split commodities_monthly_prices.

A no-no in schema design is to have multiple columns representing an array. I am thinking of str_year, admin_name, etc. Since I don't understand their purpose, I don't have a specific recommendation other than to say that the restructuring I described above is the time to fix that.

You mentioned JSON, but I don't see such in the Schema.

Upvotes: 1

Related Questions