Reputation: 1495
I have a page in which there is a <select>
menu, which contains all of the values from a small table (229 rows), such that <option value='KEY'>VALUE</option>
.
This select menu is a filter for a query which runs on a large table (3.5M rows). In the large table is a foreign key which references KEY from small table.
However, in the results of the large table query, I also need to display the relative VALUE
from the small table.
I could quite easily do an INNER JOIN
to retrieve the results, OR I could do a separate 'pre'-query to my smaller table, fetch it's values into an array, and then let the application get the VALUE
from small table results.
The application is written in PHP.
Hardware resources IS an issue (cannot upgrade to higher instance right now, boss constrained) - I am running this on a t2.micro RDS on Amazon Web Services instance. I have added both single and covering indexes on columns in WHERE & HAVING clauses, and my server is reporting that I have 46mb RAM available.
Given the above, I know that JOIN
can be expensive especially on big tables. Does it just make sense here to do 2 queries, and let the application handle some of the work, until I can negotiate better resources?
EDIT:
No Join : 6.9 sec
SELECT nationality_id, COUNT(DISTINCT(txn_id)) as numtrans,
SUM(sales) as sales, SUM(units) as units, YrQtr
FROM 1_txns
GROUP BY nationality_id;
EXPLAIN
'1', 'SIMPLE', '1_txns', 'index', 'covering,nat', 'nat', '5', NULL, '3141206', NULL
With Join: 59.03 Sec
SELECT 4_nationality.nationality, COUNT(DISTINCT(txn_id)) as numtrans,
SUM(sales) as sales, SUM(units) as units, YrQtr
FROM 1_txns INNER JOIN 4_nationality USING (nationality_id)
GROUP BY nationality_id
HAVING YrQtr LIKE :period;
EXPLAIN
'1', 'SIMPLE', '4_nationality', 'ALL', 'PRIMARY', NULL, NULL, NULL, '229', 'Using temporary; Using filesort'
'1', 'SIMPLE', '1_txns', 'ref', 'covering,nat', 'nat', '5', 'reports.4_nationality.nationality_id', '7932', NULL
Schema is
Table 1_txns (txn_id, nationality_id, yrqtr, sales, units)
Table 4_nationality (nationality_id, nationality)
I have separate indexes on each nationality_id, txn_id, yrqtr. in my Large Transactions Table. And just a primary key index on my small table.
Something strange also, is that the query WITHOUT the join, is missing a row from it's results!
Upvotes: 0
Views: 55
Reputation: 142208
First, move the HAVING
to WHERE
so that the rest of the query has less to do. Second, delay the lookup of nationality
until after the GROUP BY
:
SELECT
( SELECT nationality
FROM 4_nationality
WHERE nationality_id = t.nationality_id
) AS nationality,
COUNT(DISTINCT(txn_id)) as numtrans,
SUM(sales) as sales,
SUM(units) as units,
YrQtr
FROM 1_txns AS t
WHERE YrQtr LIKE :period
GROUP BY nationality_id;
If possible, avoid wild cards and simply do YrQtr = :period
. That would allow INDEX(YrQtr, nationality_id)
for even more performance.
Upvotes: 0
Reputation: 48139
If your lookup "menu" list table is only the 229 rows as stated, and it has a unique key, and your menu table has index on (key, value), the join would be negligible... especially if your only querying the results based on a single key anyhow.
The bigger question to me would be on your table of 3.5 million records. At 229 "menu" items, it would be returning an average of over 15k records each time. And I am sure that not every category is evenly balanced... some could have a few hundred or thousand entries, others could have 30k+ entries. Is there some other criteria that would allow smaller subsets to be returned? Obviously not enough info to quantify.
Now, after seeing your revised post while entering this, I see you are trying to get aggregations. The table would otherwise be fixed for historical data. I would suggest a summary table be done on a per Nationality/YrQtr basis. This way, you can query that directly if the period is PRIOR to the current period in question. If current period, then sum aggregates from production. Again, since transactions wont change historically, neither would their counts and you would have immediate response from the pre-summary table.
Feedback
As for how / when to implement a summary table. I would create the table with the respective columns you need... Nationality, Period (Yr/Month), and respective counts for distinct transactions, etc.
I would then pre-aggregate once for all your existing data for everything UP TO but not including the current period (Yr/Month). Now you have your baseline established in summary.
Then, add a trigger to your transaction table on insert. Then, process something like... (AND NOTE, THIS IS NOT ACTUAL TRIGGER, BUT CONTEXT OF WHAT TO DO)
update summaryTable
set numTrans = numTrans + 1,
TotSales = TotSales + NEWENTRY.Sales,
TotUnits = TotUnits + NEWENTRY.Units
where
Nationality = NEWENTRY.Nationality
AND YrQtr = NEWENTRY.YrQtr
if # records affected by the update = 0
Insert into SummaryTable
( Nationality,
YrQtr,
NumTrans,
TotSales,
TotUnits )
values
( NEWENTRY.Nationality,
NEWENTRY.YrQtr,
1,
NEWENTRY.Sales,
NEWENTRY.Units )
Now, your aggregates will ALWAYS be in synch in the summary table after EVERY record inserted into the transaction table. You can ALWAYS query this summary table instead of the full transaction table. If you never have activity for a given Nationality / YrQtr, no record will exist.
Upvotes: 1