Reputation: 7169
I have an Oracle 10g database of genomic data with several >100 million row tables that look similar to the following:
ID AssayID Sample Mutation Call Frequency
101 12578 Sample01 T367G P 0.87
102 31384 Sample01 A2345C A 0.28
103 3453 Sample01 T247C P 0.67
104 12578 Sample02 G235del M 0.11
105 7868 Sample02 None P 0.98
ID
is a unique PK, AssayID
and Sample
are foreign keys.Sample
value, there are ~50k rows. AssayID
occurs exactly once per Sample
. Mutation
is relatively random and Call
can be one of three values. AssayID
, Sample
, Mutation
, Call
, or a value in a linked table via AssayID
and Sample
. A typical query:
select t.*
from this_table t
join assay_table a on t.assayid = a.assayid
join sample_table s on t.sample = s.sample
where
s.name = 'xxx' and a.gene in ('abc', 'xyz') and t.call = 'P'
WHERE
statement will usually filter data on multiple columns, but never from only the base data table.How do I design the table to get the best query performance when selecting all columns?
Do I use indexes only, partitions only, or a combination of the two? Disk space and insert/update performance is not an issue.
Upvotes: 1
Views: 1167
Reputation: 7169
After creating numerous test copies of tables with different combinations of indexes and partitions, and running a battery of performance analyses using a broad selection of queries, I don't think that there is a single, simple answer for this question. Each situation is different, and the scope of this question is too broad for this forum. Thank you everyone for the feedback, it was all helpful.
Upvotes: 0
Reputation: 8362
As a first step, you could run Oracle's SQL Access Advisor and see what recommendations it provides.
Upvotes: 1