Continuation - Viewing FIRST_ROWS before query completes

Question

I have identified the query constructs my users normally use. Would it make sense for me to create composite indexes to support those constructs and provide FIRST_ROWS capability?

If I migrate from SE to IDS, I will lose the ability to write low-level functions with C-ISAM calls, but gain FIRST_ROWS along with other goodies like: SET-READS for index scans (onconfig USE_[KO]BATCHEDREAD), optimizer directives, parallel queries, etc.

Information from Comments

Pawnshop production tables are queried by: customer.name char(30) using wildcards (LASSURF* to find LASTNAME SURNAME, FIRSTNAME) or queried by pawns.ticket_number INT. Customer and pawns are joined by: customer.name = pawns.name, not customer.serial = pawns.fk. Pawns with trx date older than 1 year are moved to historical table (>500K nrows) in a different database, on another hard disk. Index on historical is by trx_date descending. This is where the ad-hoc composite query constructs come into play.

Once a customer's pawn transaction is found, the row is updated when an intrest or redeem pymt is made by the customer. If customers don't make a pymt in 90 days, users will mananually update which pawns they will forfeit. pawns.status changes to inactive when a customer redeems a pawn or forfeits it for lack of pymt. inactives are moved out of pawns table into historical table when their trx dates are older than 1 year, so no mass-updating occurs in this app. Pawnshops run this proc every morning before opening business.

{ISQL 2.10.06E (SE-DOS16M protected mode) pawns table optimization - 
 once-daily, before start of business, procedure}

 unload to "U:\UNL\ACTIVES.UNL"
    select * from pawns where pawns.status = "A"
  order by pawns.cust_name, pawns.trx_date;

 unload to "U:\UNL\INACTIVE.UNL"
    select * from pawns
     where pawns.status <> "A"
       and pawns.trx_date >= (today - 365)
  order by pawns.cust_name, pawns.trx_date desc;

 unload to "U:\UNL\HISTORIC.UNL"
    select * from pawns
     where pawns.status <> "A"
       and pawns.trx_date < (today - 365)
  order by pawns.trx_date desc;

 drop table pawns;

 create table pawns
 (
     trx_num serial,
     cust_name char(30),
     status char(1),
     trx_date date,
 . . . ) in "S:\PAWNSHOP.DBS\PAWNS";

 load from "U:\UNL\ACTIVES.UNL" insert into pawns;         {500:600 nrows avg.}
 load from "U:\UNL\INACTIVE.UNL" insert into pawns;        {6500:7000 nrows avg.}
 load from "U:\UNL\HISTORIC.UNL" insert into dss:historic; {>500K nrows}

 create cluster index pa_cust_idx on pawns (cust_name);

 {this groups each customers pawns together, actives in
  oldest trx_date order first, then inactive pawns within the last year in most 
  recent trx_date order. inactives older than 1 year are loaded into historic 
  table in a separate database, on a separate hard disk. historic table 
  optimization is done on a weekly basis for DSS queries.}

 create unique index pa_trx_num_idx on pawns (trx_num);
 create index pa_trx_date_idx on pawns (trx_date);
 create index pa_status_idx on pawns (status);

 {grant statements...}

 update statistics;

Jonathan Leffler · Accepted Answer

There isn't a simple yes/no answer - it is a balancing act, as with so many performance issues.

There are two main costs associated with indexes which must be balanced against the benefits.

Indexes must be maintained as rows are added, deleted, modified in the table. The cost is not huge, but neither is it negligible.
Indexes occupy disk space.

There is also a small overhead when queries are optimized simply because there are more indexes to consider.

The primary benefit of good indexes is vastly improved performance on selecting data when the index can be used to good effect.

If your tables are not very volatile and are frequently searched with criteria where the indexes can help, then it probably makes sense to create the composite indexes, assuming that disk space is not an issue.

If your tables are very volatile, or if a specific index will seldom be used (but is beneficial on those few occasions when it is used), then you should perhaps weigh the almost one-off cost of a slower query against the cost of storing and maintaining the index for those few occasions when it can be used.

There is a quite good book on the subject of index design: Relational Database Index Design and the Optimizers by Lahdenmäki and Leach (it is also fairly expensive).

In the latest comment, Frank says:

[L]ooking for a couple of things. As its already been said, the simplest thing to do is to allow Informix to start returning rows once it has them. (Oracle does this by default.) The larger picture to what Frank is asking for is something similar to what Google has. Ok it really goes back to Alta Vista and the 90's when talking about search indexes on the web. The idea is that you can do a quick search, pick up the first n things while reporting a "number" of rows returned in the search. (As if the number reported by Google is accurate.)

This additional comment from Frank makes more sense in the context of the question for which this is a continuation.

Obviously, unless the SQL statement forces Informix to do a sort, it makes results available as soon as it has them; it always has. The FIRST_ROWS optimization hint indicates to IDS that if it has a choice of two query plans and one will let it produce the first rows more quickly than the other, then it should prefer the one that produces the first rows quickly, even if it is more expensive overall than the alternative. Even in the absence of the hint, IDS still tries to make the data available as quickly as possible - it just also tries to do it as efficiently as possible too.

When the query is prepared, you get an estimate of how many rows may be returned - you could use that as an indicator (a few, quite a lot, very many). Separately, you can quickly and independently discover the number of rows in the main table you are searching. Given this metadata, you can certainly use a technique with a scroll cursor to give you a backing store in the database that contains the primary key values of the rows you are interested in. At any time, you can load an array with the display data for a set of interesting rows for display to the user. On user request, you can arrange to display another page full of information. At some point in the proceedings, you will find that you've reached the end of the data in the scroll cursor. Clearly, if you do FETCH LAST, you force that to happen. If you just do a few more FETCH NEXTs, then you will eventually get a NOTFOUND condition.

All of this has been possible with Informix (IDS and its prior incarnations, OnLine, Turbo, SE, plus I4GL) since the late 80s. The FIRST_ROWS optimization is more recent; it is still just a hint to the optimizer, and usually makes little difference to what the optimizer does.

Continuation - Viewing FIRST_ROWS before query completes

Information from Comments

Answers (1)

Related Questions