Reputation: 5467
I have large table ir_data (150GB) which contains data for different dates (column val_date). I need to know if a given date is available in the ir_data at various points in my application.
select distinct(val_date) from ir_data
I the following experiment ir_data contains 29 different values for val_date.
SETUP 1
I expected an index on ir_data (val_date, key_id, other_colum) to help finding the 29 values quickly. In fact this takes more than 5 minutes:
Query 1 of 1, Rows read: 29, Elapsed time (seconds) - Total: 343.96, SQL query: 343.958, Reading results: 0.002
I always expected an index to be a tree where the nodes are stored in a tree structure, e.g. like this
val_date -> key_id -> other_column -> data-nodes
1.1.2017 -> 0-50 -> A -> (1.1.2017, 0, Automobile), (1.1.2017, 2, Amsterdam)
-> B-E -> (1.1.2017, 12, Batman)
-> 51-100 -> A -> ...
X
-> 666-1000 -> A
-> B-C
-> E
2.1.2017 -> ...
Based on this structure getting the 29 different val_dates should be very quick.
Question: Why does this take so long???
Sub-question: Is there a way to fix this without creating another table?
SETUP 2
I created another index which only contains val_date. It takes about the same amount of time.
Query-Plan:
The type of query is SELECT.
2 operator(s) under root
|ROOT:EMIT Operator (VA = 2)
|
| |GROUP SORTED Operator (VA = 1)
| |Distinct
| |
| | |SCAN Operator (VA = 0)
| | | FROM TABLE
| | | ir_data
| | | Index : ir_data_idx1 <-- this is the index containing only val_date.
| | | Forward Scan.
| | | Positioning at index start.
| | | Index contains all needed columns. Base table will not be read.
| | | Using I/O Size 16 Kbytes for index leaf pages.
| | | With MRU Buffer Replacement Strategy for index leaf pages.
Upvotes: 3
Views: 705
Reputation: 5467
A recursive CTE speeds up this query considerably, i.e. very few distinct values in a large table. The problem is that index seeking during search for distinct values is not implemented currently. Here is a link to the approach.
Upvotes: 0
Reputation: 9365
Your table is very large as well as your index. As you can see in the plan, the engine performs an index scan. This operation will be lengthy because it will scan your entire index for distinct values.
As a first step, You could try update index statistics
on the index, though, i don't really think it'll help.
If it's a one time manual operation, i guess you'll be satisfied with the 5 min operation.
If it's a query executed by your application then you have 2 choices i can think of:
Create a precomputed result set
. This is a materialized view - the results are stored like a regular table (as opposed to a view where only its definition is stored).
It will automatically refresh the results in the view and values will be retrieved fast.
Important: like indexes, it will have an impact on the performance of Insert, Update...
It can look like:
create precomputed result set prs_ir_data
immediate refresh
as
select distinct val_date
from ir_data
You can read here and here about precomputed result set
Upvotes: 2