johncarter
johncarter

Reputation: 71

Slow subquery: group by a groupwise maximum

I have two tables:

CREATE TABLE share_prices (
    price_id int(10) unsigned NOT NULL AUTO_INCREMENT,
    price_date date NOT NULL,
    company_id int(10) NOT NULL,
    high decimal(20,2) DEFAULT NULL,
    low decimal(20,2) DEFAULT NULL,
    close decimal(20,2) DEFAULT NULL,
    PRIMARY KEY (price_id),
    UNIQUE KEY price_date (price_date,company_id),
    KEY company_id (company_id),
    KEY price_date_2 (price_date)
) ENGINE=InnoDB AUTO_INCREMENT=368586 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

And

CREATE TABLE rating_lookup (
    rating_id int(11) NOT NULL,
    start_date date DEFAULT NULL,
    start_price decimal(10,2) DEFAULT NULL,
    broker_id int(11) DEFAULT NULL,
    company_id int(11) DEFAULT NULL,
    end_date date DEFAULT NULL,
    PRIMARY KEY (rating_id),
    KEY idx_rating_lookup_company_id (company_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

This is the current query:

SELECT broker_id, count(rating_id)

FROM (

    SELECT rating_lookup.*,
    share_prices.company_id as correct_company,
    share_prices.price_date,
    max(high) as peak_gain,
    ( ( ( max(high) - rating_lookup.start_price ) / rating_lookup.start_price ) * 100 ) as percent_gain

    FROM rating_lookup, share_prices

    WHERE share_prices.price_date > rating_lookup.start_date 
    AND share_prices.price_date < ifnull(end_date, curdate())
    AND share_prices.company_id = rating_lookup.company_id

    GROUP BY rating_id

    HAVING percent_gain > 5

) correct

GROUP BY broker_id

Currently this query takes 10.969 sec.

The isolated subquery takes 0.391 sec (duration) / 10.438 sec (fetch)

Query objective:

Get the total amount of correct ratings per broker_id.

A correct rating is defined as a rating that has a reached + 5% since its start_price.


I am looking to drastically decrease the query time, even if restructuring the database is the only way.


Appendix

Explain of above query:

+---+---------+---------------+-------+--------------------------------------+------------+---+----------------------------------------+---------+---------------------------------+
| 1 | PRIMARY | <derived2>    | ALL   |                                      |            |   |                                        | 3894800 | Using temporary; Using filesort |
| 2 | DERIVED | rating_lookup | index | PRIMARY,idx_rating_lookup_company_id | PRIMARY    | 4 |                                        |   18200 | Using where                     |
| 2 | DERIVED | share_prices  | ref   | price_date,company_id,price_date_2   | company_id | 4 | brokermetrics.rating_lookup.company_id |     214 | Using where                     |
+---+---------+---------------+-------+--------------------------------------+------------+---+----------------------------------------+---------+---------------------------------+

share_prices ~ 375,000 rows

rating_lookup ~ 18,000 rows with around 46 unique brokers

Upvotes: 2

Views: 94

Answers (3)

J A
J A

Reputation: 1766

Extending Klas' answer, below is a schema of a "summary" table that could be populated with pre-calculated record per-broker, per-company, per-day basis.

Disclaimer: haven't tested on real data but should work.

CREATE TABLE `price_summary` (
`price_id` int(10) NOT NULL,
`broker_id` int(10) NOT NULL DEFAULT '0',
`company_id` int(10) NOT NULL DEFAULT '0',
`start_date` int(10) NOT NULL DEFAULT '0',
`end_date` int(10) NOT NULL DEFAULT '0',
`peak_gain` int(10) NOT NULL DEFAULT '0',
`max_price` int(10) NOT NULL DEFAULT '0',
`percentage_gain` decimal(10,0) NOT NULL DEFAULT '0',
`updated_on` int(10) NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

--
-- Indexes for dumped tables
--

--
-- Indexes for table `price_summary`
--
ALTER TABLE `price_summary`
ADD PRIMARY KEY (`price_id`),
ADD UNIQUE KEY `broker_company_date` (`broker_id`,`company_id`,`start_date`) USING BTREE,
ADD KEY `broker_id` (`broker_id`),
ADD KEY `company_id` (`company_id`),
ADD KEY `start_date` (`start_date`),
ADD KEY `end_date` (`end_date`),
ADD KEY `peak_gain` (`peak_gain`),
ADD KEY `max_price` (`max_price`),
ADD KEY `percentage_gain` (`percentage_gain`);

ALTER TABLE `price_summary`
MODIFY `price_id` int(10) NOT NULL AUTO_INCREMENT; 

And a sample query to retrieve desired records.

SELECT
    broker_id,
    count(company_id) as company_count
FROM
    price_summary
WHERE
    start_date > {input_timestamp}
    AND
    end_date < {input_timestamp/now()}
    AND
    percentage_gain > {input_percentage}
GROUP BY
    broker_id 

Upvotes: 1

Klas Lindb&#228;ck
Klas Lindb&#228;ck

Reputation: 33273

I assume that share prices are inserted once per day after the market closes (or a couple of times per day if you cover multiple markets).

If you don't manage to tune the query sufficiently you can pre-calculate the result. Run the query after every time you finish loading a batch of new stock prices. Insert the result in a new table. Reading the pre-calculated data should be fast enough.

Upvotes: 2

Rick James
Rick James

Reputation: 142306

PRIMARY KEY (price_id),   -- useless
UNIQUE KEY price_date (price_date,company_id), -- could/should be PK
KEY company_id (company_id),
KEY price_date_2 (price_date)  -- redundant

-->

PRIMARY KEY(price_date, company_id),
KEY company_id (company_id)

decimal(20,2) consumes 9 bytes, no existing stock is likely to exceed 6 digits to the left of the decimal point, and does not handle low-priced stocks that need more than two decimals. Consider DECIMAL(8,2) (4 bytes) or (10,4) (5 bytes). FLOAT (4 bytes) avoids most of the issues, but is limited to 7 significant digits.

Smaller --> more cacheable --> less I/O --> faster.

Don't SELECT stuff you don't need. All you need is

SELECT rating_id, broker_id

and move the expression to the HAVING:

HAVING ((( max(high)... *100) > 5

Please use the JOIN..ON syntax:

  FROM  rating_lookup, share_prices
  WHERE share_prices.company_id = rating_lookup.company_id
    AND ...

-->

  FROM rating_lookup AS r
  JOIN share_prices AS p
    ON p.company_id = r.company_id
  WHERE ...

Upvotes: 1

Related Questions