Dreamer
Dreamer

Reputation: 7551

Repeating selects or use IN clause, which is faster?

I find this becomes a common situation when design JDBC/JPA queries when using a collection as where condition for selection.

Let's say if there is a table of 50 thousand records with field order_id which is properly indexed. Now the java application have a list of 500 order ids to find order details and need to assign values to each order object. So there can be two plan

1. run 500 SELECT queries

for(String id:order_ids){
    Order order = QueryAgent.execute("SELECT * FROM ORDES o WHERE o.order_id ="+id);
    modifyOrder(order);
}

2. run one query whith 500 parameters in 

String orders_in_string = getOrdersInString(order_ids);
List<Order> orders = QueryAgent.execute("SELECT * FROM ORDES o WHERE o.order_id IN ="+orders_in_string);
for(Order order:orders){
    modifyOrder(order);
}

I cannot tell which one get better performance.

Upvotes: 1

Views: 92

Answers (2)

Hannes
Hannes

Reputation: 2073

If you do you benchmarks, please check first if you can have 500 args in an IN statement. Second retrieve necessary data only, and third try prepared statements instead. But, if by any chance you can have the database select you 500 order ids in one or two queries, try to join tables or subselect data. This is what relational databases are meant for.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1270483

The two queries are going to return the same volume of results. So, your real question is which is faster: running 500 small queries or one large query to get the same results.

The proper answer to a performance question is to encourage you to try it on your system and see which is faster on your data in your environment.

In this case, there is every reason to expect that a single query would be faster. SQL incurs overhead in parsing queries (or finding already parsed equivalent queries), in starting a query, and in preparing results. All of these happen once per query. So, I would expect one query to be faster.

If the list of order_ids is already coming from the database, I would recommend building a more complex query, so they don't have to go back to the application. Something like:

select o.*
from orders o
where o.order_id in (select . . . );

Also, if you don't need all the columns, you should just explicitly choose the ones you want. Actually, it is a good idea to be always explicit in the columns being chosen.

Upvotes: 6

Related Questions