dwjohnston
dwjohnston

Reputation: 11870

Is running several single 'select where id =...' worse than running a single 'select where id in (...)'?

Say I have two tables :

ID      ATTRIBUTE
------ -----------
1        'FOO'
2        'BAR'
4        'BIZZ'

ID      ATTRIBUTE2
------ -----------
1        'FOO2'
2        'BAR2'
3        'BIZZ2'

Now I want to fetch all these attributes.

The way I see it, I have two ways of doing it, I can either query both tables, one ID at a time, or I can query both tables for all the IDs and go through the resulting set to extract the attributes.

Option 1:

List<MyObj> myObjs = new ArrayList<MyObj>();
for (int i = 1; i<5; i++)
{
    String sql1 = "select  attribute from table1 where id = ?";
    String sql2 = "select attribute2 from table2 where id = ?";
    PreparedStatement ps = conn.prepareStatement(sql);
    ps.setInt(1, i); 
    PreparedStatement ps2 = conn.prepareStatement(sql2);
    ps2.setInt(1,i);  

    ResultSet rs1 = ps.executeQuery();
    ResultSet rs2 = ps2.executeQuery();

    myObjs.add(new MyObj(i, rs1.getString("attribute1"), rs2.getString("attribute2")); 
    //No null handling but.

}

Option 2:

    String sql1 = "select id, attribute from table1 where id in (?)"; 
    String sql2 = "select id, attribute2 from table2 where id in (?)";
    PreparedStatement ps = conn.prepareStatement(sql);
    ps.setArray(1, new ArrayList<Integer>(1,2,3,4,5); 
    PreparedStatement ps2 = conn.prepareStatement(sql2);
    ps2.setArray(1, new ArrayList<Integer>(1,2,3,4,5);

    ResultSet rs1 = ps.executeQuery();
    ResultSet rs2 = ps2.executeQuery();

    Hashmap<Integer,MyObj> myObjs = new HashMap<Integer, MyObj>();
    while (rs.next())
          {
          myOjbs.add(rs.getInt("id")), new MyObj(rs.getInt("id"), rs.getString("attribute")));
          }


    while (rs2.next())
         {
              myObjs.get(rs2.getInt("id")).setAttribute2(rs2.getString("attribute2"))
         }

Option 1 seems a lot simpler for handling scenarios where one of the attributes might be missing. However, there are repeated SQL calls which might repeat millions of times. Option 2 has less SQL calls, but requires handling scenarios where the lists might be out of sync.

The question is - is one of these solutions obviously wrong?

Upvotes: 1

Views: 62

Answers (2)

John Bollinger
John Bollinger

Reputation: 181034

Option 1 assumes that both queries will return a result every time they are executed. Option 2 assumes that every row returned by the second query will correspond to one returned by the first query. Neither of these is an altogether safe assumption. At minimum your Java code should check these conditions when processing the query results, so as to handle such issues and/or bail out gracefully when necessary.

With that said, option 2 is likely to be more efficient than option 1, but even option 2 seems to be missing an opportunity to leverage the capabilities of the database. Why not instead return all the data via one query? That might look like this:

select t1.id, t1.attribute, t2.attribute2
from table1 t1 join table2 t2 on t2.id = t1.id
where t1.id in (?)

That takes care of some of the problems I pointed out by ensuring that there is both an attribute1 and an attribute2 for each row returned. It will return no row for any id that doesn't have both. If you want to diagnose the case where some ids are missing one attribute or the other, then replacing the (inner) join in that query with one of the flavors of outer join would yield results that inform about cases where only one of the two attributes is given for a particular id.

Note, too, that you cannot use a union [all] for this job, because that would not enable you to distinguish the attribute1 values from the attribute2 values, which your Java code shows you want to do.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270483

If you want to fetch the id, attribute pairs, then use one query:

select id, attribute from table1 where id in (?)
union all
select id, attribute2 from table2 where id in (?)

Although the best answer to performance questions is "try it on your system with your data and see what happens", there are some guidelines. Fewer queries is generally better, because there is an overhead to getting data in and out of the database.

Upvotes: 0

Related Questions