GKall
GKall

Reputation: 31

AWS Redshift Query Plan Warning

I am new to RedShift and just experimenting at this stage to help with table design.

We have a very simple table with about 6 million rows and 2 integer fields.

Both integer fields are in the sort key but the plan has a warning - "very selective query filter".

The STL_Alert_Event_Log entry is: 'Very selective query filter:ratio=rows(61)/rows_pre_user_filter(524170)=0.000116'

The query we are running is:

select count(*) 
from LargeNumberofRowswithUniKey r 
where r.benchmarkid = 291891 and universeid = 300901

Our Table DDL is:

CREATE TABLE public.LargeNumberofRowswithUniKey
(
    benchmarkid INTEGER NOT NULL DISTKEY,
    UniverseID INTEGER NOT NULL
)
SORTKEY
(
    benchmarkid,UniverseID
);

We have also run the following commands on the table:

Vacuum full public.LargeNumberofRowswithUniKey;
Analyze public.LargeNumberofRowswithUniKey;

The screenshot of the plan is here: [Query Plan Image][1] My expectation was that the multiple sort key including Benchmark and Universe and the fact that both are part of the filter predicate would ensure that the design was optimal for the sample query. This does not seem to be the case, hence the red warning symbol in the attached image. Can anyone shed light on this?

Thanks

George

Update 2017/09/07 I have some more information that may help:

If I run a much simpler query which just filters on the first column of the sort key.

select r.benchmarkid 
from LargeNumberofRowswithUniKey r 
where r.benchmarkid = 291891

This results in 524,170 rows being scanned according to the actual query plan from the console. When I look at the blocks using STV_BLOCKLIST. The relevant blocks that might be required to satisfy my query are:

|slice|col|tbl   |blocknum|num_values|minvalue|maxvalue|
|    1|  0|346457|       4|    262085|  291881|  383881|
|    3|  0|346457|       4|    262085|  291883|  344174|
|    0|  0|346457|       5|    262085|  291891|  344122|

So shouldn't there be 786,255 rows scanned (3 x 262,085) instead of 524,170 (2 x 262,085) as listed in the plan?

Upvotes: 3

Views: 3956

Answers (2)

John Rotenstein
John Rotenstein

Reputation: 269171

Side-observation: If you are always selecting values by using both benchmarkid and UniverseID, you should probably use DISTKEY EVEN.

The reason for this is that a benchmarkid DISTKEY would distribute the data between slices based on benchmarkid. All the values for a given benchmarkid would be on the same slice. If your query always provides a benchmarkid in the query, then the query only utilizes one slice.

On the other hand, if it used DISTKEY EVEN, then every slice can participate in the query, making it more efficient (for queries with WHERE benchmarkid = xxx).

A general rule of thumb is:

  • Use DISTKEY for fields commonly used in JOIN or GROUP BY
  • Use SORTKEY for fields commonly used in WHERE

Upvotes: 5

Nathan Griffiths
Nathan Griffiths

Reputation: 12756

The "very selective filter" warning is returned when the rows selected vs rows scanned ratio is less than 0.05 i.e. a relatively large number of rows are scanned compared to the number of rows actually returned. This can be caused by having a large number of unsorted rows in a table, which can be resolved by running a vacuum. However, as you're already doing that I think this is happening because your query is actually very selective (you're selecting a single combination of benchmarkid and universeid) and so you can probably ignore this warning.

Upvotes: 4

Related Questions