Reputation: 5028
My application allows users to collect measurement data as part of an experiment, and needs to have the ability to report on all of the measurements ever taken.
Below is a very simplified version of the tables I have:
CREATE TABLE EXPERIMENTS(
EXPT_ID INT,
EXPT_NAME VARCHAR2(255 CHAR)
);
CREATE TABLE USERS(
USER_ID INT,
EXPT_ID INT
);
CREATE TABLE SAMPLES(
SAMPLE_ID INT,
USER_ID INT
);
CREATE TABLE MEASUREMENTS(
MEASUREMENT_ID INT,
SAMPLE_ID INT,
MEASUREMENT_PARAMETER_1 NUMBER,
MEASUREMENT_PARAMETER_2 NUMBER
);
In my database there are 2000 experiments, each of which has 18 users. Each user has 6 samples to measure, and would do 100 measurements per sample.
This means that there are 2000 * 18 * 6 * 100 = 21600000 measurements currently stored in the database.
I'm trying to write a query that will get the AVG() of measurement parameter 1 and 2 for each user - that would return about 36,000 rows.
The query I have is extremely slow - I've left it running for over 30 minutes and it doesn't come back with anything. My question is: is there an efficient way of getting the averages? And is it actually possible to get results back for this amount of data in a reasonable time, say 2 minutes? Or am I being unrealistic?
Here's (again a simplified version) the query I have:
SELECT
E.EXPT_ID,
U.USER_ID,
AVG(MEASUREMENT_PARAMETER_1) AS AVG_1,
AVG(MEASUREMENT_PARAMETER_2) AS AVG_2
FROM
EXPERIMENTS E,
USERS U,
SAMPLES S,
MEASUREMENTS M
WHERE
U.EXPT_ID = E.EXPT_ID
AND S.USER_ID = U.USER_ID
AND M.SAMPLE_ID = S.SAMPLE_ID
GROUP BY E.EXPT_ID, U.USER_ID
This will return a row for each expt_id/user_id combination and the average of the 2 measurement parameters.
Upvotes: 0
Views: 428
Reputation: 6640
For your query, in any case, the DBMS needs to read the complete measurements table. This is by far the biggest part of data to read, and the part which takes most time if the query is optimized well (will come to that later). That means that the minimum runtime of your query is about the time it takes to read the complete measurements table from whereever it is stored. You can get a rough estimate by checking how much data that is (in MB or GB) and checking how much time it would take to read this amount of data from the harddisk (or where the table is stored). If your query runs slower by a factor of 5 or more, you can be sure that there is room for optimization.
There is a vast amount of information (tutorials, individual hints which can be invaluable, and general practices lists) about how to optimize oracle queries. You will not get through all this information quickly. But if you provide the execution plan of your query (this is what oracle's query optimizer thinks is the best way to fulfill your query), we will be able to spot steps which can be optimized and suggest solutions.
Upvotes: 1