Gagan Maheshwari
Gagan Maheshwari

Reputation: 33

Redshift large 'in' clause best practices

We have a query in which a list of parameter values is provided in "IN" clause of the query. Some time back this query failed to execute as the size of data in "IN" clause got quite large and hence the resulting query exceeded the 16 MB limit of the query in REDSHIFT. As a result of which we then tried processing the data in batches so as to limit the data and not breach the 16 MB limit. My question is what are the factors/pitfalls to keep in mind while supplying such large data for the "IN" clause of a query or is there any alternative way in which I can deal with such large data for the "IN" clause?

Upvotes: 2

Views: 2500

Answers (2)

AlexYes
AlexYes

Reputation: 4208

Large IN is not the best practice itself, it's better to use joins for large lists:

  1. construct a virtual table a subquery
  2. join your target table to the virtual table

like this

with
your_list as (
    select 'first_value' as search_value
    union select 'second_value'
    ...
)
select ...
from target_table t1
join your_list t2
on t1.col=t2.search_value

Upvotes: 0

Jon Scott
Jon Scott

Reputation: 4354

If you have control over how you are generating your code, you could split it up as follows

first code to be submitted, drop and recreate filter table:

drop table if exists myfilter;
create table myfilter (filter_text varchar(max));

Second step is to populate the filter table in parts of a suitable size, e.g. 1000 values at a time

insert into myfilter
values({{myvalue1}},{{myvalue2}},{{myvalue3}} etc etc up to 1000 values );

repeat the above step multiple times until you have all of your values inserted

Then, use that filter table as follows

select * from master_table
where some_value in (select filter_text from myfilter);
drop table myfilter;

Upvotes: 1

Related Questions