Reputation: 71
I have two tables:
CREATE TABLE share_prices (
price_id int(10) unsigned NOT NULL AUTO_INCREMENT,
price_date date NOT NULL,
company_id int(10) NOT NULL,
high decimal(20,2) DEFAULT NULL,
low decimal(20,2) DEFAULT NULL,
close decimal(20,2) DEFAULT NULL,
PRIMARY KEY (price_id),
UNIQUE KEY price_date (price_date,company_id),
KEY company_id (company_id),
KEY price_date_2 (price_date)
) ENGINE=InnoDB AUTO_INCREMENT=368586 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
And
CREATE TABLE rating_lookup (
rating_id int(11) NOT NULL,
start_date date DEFAULT NULL,
start_price decimal(10,2) DEFAULT NULL,
broker_id int(11) DEFAULT NULL,
company_id int(11) DEFAULT NULL,
end_date date DEFAULT NULL,
PRIMARY KEY (rating_id),
KEY idx_rating_lookup_company_id (company_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
This is the current query:
SELECT broker_id, count(rating_id)
FROM (
SELECT rating_lookup.*,
share_prices.company_id as correct_company,
share_prices.price_date,
max(high) as peak_gain,
( ( ( max(high) - rating_lookup.start_price ) / rating_lookup.start_price ) * 100 ) as percent_gain
FROM rating_lookup, share_prices
WHERE share_prices.price_date > rating_lookup.start_date
AND share_prices.price_date < ifnull(end_date, curdate())
AND share_prices.company_id = rating_lookup.company_id
GROUP BY rating_id
HAVING percent_gain > 5
) correct
GROUP BY broker_id
Currently this query takes 10.969 sec.
The isolated subquery takes 0.391 sec (duration) / 10.438 sec (fetch)
Query objective:
Get the total amount of correct ratings per broker_id.
A correct rating is defined as a rating that has a reached + 5% since its start_price.
I am looking to drastically decrease the query time, even if restructuring the database is the only way.
Explain of above query:
+---+---------+---------------+-------+--------------------------------------+------------+---+----------------------------------------+---------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | | | | | 3894800 | Using temporary; Using filesort |
| 2 | DERIVED | rating_lookup | index | PRIMARY,idx_rating_lookup_company_id | PRIMARY | 4 | | 18200 | Using where |
| 2 | DERIVED | share_prices | ref | price_date,company_id,price_date_2 | company_id | 4 | brokermetrics.rating_lookup.company_id | 214 | Using where |
+---+---------+---------------+-------+--------------------------------------+------------+---+----------------------------------------+---------+---------------------------------+
share_prices
~ 375,000 rows
rating_lookup
~ 18,000 rows with around 46 unique brokers
Upvotes: 2
Views: 94
Reputation: 1766
Extending Klas' answer, below is a schema of a "summary" table that could be populated with pre-calculated record per-broker, per-company, per-day basis.
Disclaimer: haven't tested on real data but should work.
CREATE TABLE `price_summary` (
`price_id` int(10) NOT NULL,
`broker_id` int(10) NOT NULL DEFAULT '0',
`company_id` int(10) NOT NULL DEFAULT '0',
`start_date` int(10) NOT NULL DEFAULT '0',
`end_date` int(10) NOT NULL DEFAULT '0',
`peak_gain` int(10) NOT NULL DEFAULT '0',
`max_price` int(10) NOT NULL DEFAULT '0',
`percentage_gain` decimal(10,0) NOT NULL DEFAULT '0',
`updated_on` int(10) NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Indexes for dumped tables
--
--
-- Indexes for table `price_summary`
--
ALTER TABLE `price_summary`
ADD PRIMARY KEY (`price_id`),
ADD UNIQUE KEY `broker_company_date` (`broker_id`,`company_id`,`start_date`) USING BTREE,
ADD KEY `broker_id` (`broker_id`),
ADD KEY `company_id` (`company_id`),
ADD KEY `start_date` (`start_date`),
ADD KEY `end_date` (`end_date`),
ADD KEY `peak_gain` (`peak_gain`),
ADD KEY `max_price` (`max_price`),
ADD KEY `percentage_gain` (`percentage_gain`);
ALTER TABLE `price_summary`
MODIFY `price_id` int(10) NOT NULL AUTO_INCREMENT;
And a sample query to retrieve desired records.
SELECT
broker_id,
count(company_id) as company_count
FROM
price_summary
WHERE
start_date > {input_timestamp}
AND
end_date < {input_timestamp/now()}
AND
percentage_gain > {input_percentage}
GROUP BY
broker_id
Upvotes: 1
Reputation: 33273
I assume that share prices are inserted once per day after the market closes (or a couple of times per day if you cover multiple markets).
If you don't manage to tune the query sufficiently you can pre-calculate the result. Run the query after every time you finish loading a batch of new stock prices. Insert the result in a new table. Reading the pre-calculated data should be fast enough.
Upvotes: 2
Reputation: 142306
PRIMARY KEY (price_id), -- useless
UNIQUE KEY price_date (price_date,company_id), -- could/should be PK
KEY company_id (company_id),
KEY price_date_2 (price_date) -- redundant
-->
PRIMARY KEY(price_date, company_id),
KEY company_id (company_id)
decimal(20,2)
consumes 9 bytes, no existing stock is likely to exceed 6 digits to the left of the decimal point, and does not handle low-priced stocks that need more than two decimals. Consider DECIMAL(8,2)
(4 bytes) or (10,4)
(5 bytes). FLOAT
(4 bytes) avoids most of the issues, but is limited to 7 significant digits.
Smaller --> more cacheable --> less I/O --> faster.
Don't SELECT stuff you don't need. All you need is
SELECT rating_id, broker_id
and move the expression to the HAVING:
HAVING ((( max(high)... *100) > 5
Please use the JOIN..ON
syntax:
FROM rating_lookup, share_prices
WHERE share_prices.company_id = rating_lookup.company_id
AND ...
-->
FROM rating_lookup AS r
JOIN share_prices AS p
ON p.company_id = r.company_id
WHERE ...
Upvotes: 1