zr00
zr00

Reputation: 548

Query with inequality in index using Progress

What is the fastest way to query for records using an inequality in Progress 4GL? For example, if I need to find all the records whose state field doesn't match 'MI', how would I write that for best performance and why?

Various solutions I have been told include using a broader or different index and then using an IF statement, to avoid any use of inequality, such as:

FOR EACH record NO-LOCK:
  IF record.state = "MI" THEN NEXT.
  /*do stuff*/
END.

I've been told to avoid using NE statements, as they kill performance,

FOR EACH record NO-LOCK
  WHERE record.state NE "MI":
  /*do stuff slowly, apparently*/
END.

but I've also been told using OR is evil as well.

FOR EACH record NO-LOCK
  WHERE record.state = "WI" OR "AL":
  /*didn't write all 49 minus MI for space*/
END.

I've not been given substantive evidence for why any of these three would be superior, and there isn't sufficient data in my development environment to test with the actual situation I'm working on.

Upvotes: 2

Views: 2009

Answers (2)

Tom Bascom
Tom Bascom

Reputation: 14020

It all depends on what you are really trying to do and what the real data and indexes actually are.

Luckily you can test these things using a harness similar to the following:

define variable i  as integer no-undo.
define variable lr as integer no-undo.

find _myconnection no-lock.

find _userio no-lock where _userio-id = _myconn-userid + 1.

lr = _userio-dbaccess.

for each metric-desc no-lock              /* query to test goes here... */
    where db-id < 1600 or db-id > 1600:

  i = i + 1.

end.

find _userio no-lock where _userio-id = _myconn-userid + 1.
lr = _userio-dbaccess - lr.

display lr i.

In this case I have a table called "metric-desc". It happens to have an index whose leading component is a field called "db-id". Sort of like having a field called "state".

It has 13,358 records in it.

There are 52 records with a db-id = 1600 (I plucked 1600 out of thin air, rather like state = "MI").

If I comment out the WHERE clause and read the whole thing I get 26,994 "logical reads" (index entries plus records) This is the work the the db engine does to resolve the query.

If I use "db-id <> 1600" (similar to "state <> 'MI' " in your example) I get the same result -- the whole table is scanned.

If I replace that where clause with the OR that is shown then it takes 26,892 logical reads -- the records with db-id = 1600 are not read.

As far as general rules of thumb go, "=" is your friend, "<>" is usually bad (as is NOT). If you can think of a way to express something as "=" that is going to work best. Range matches ("<", "<=", ">", ">=" are not as good but they are less bad than "<>") OR might be ok -- it depends on the indexes and how you put the query together.

The most important index rule is to use as many equality matches on leading components as possible. As soon as you depart from that rule you are making compromises.

Upvotes: 2

Tim Kuehn
Tim Kuehn

Reputation: 3251

It all depends on how well your query matches up with an available index.

Your first example does what is called a "table scan" - it'll look at every single record in the table before doing the IF to see if it's the one you want. Most of the time this is not what you want, particularly if the table is either large or frequently queried.

equals "=" is the most performant, particularly when there's an index on the field or fields you're querying.

"OR" can be evil if it's combined with an "AND" like so:

WHERE customer.AmountDue > SomeValue AND 
      (customer.state = "MI" OR customer.state = "WI").

The reason is that the db engine can't do any index lookup with the ORs, so it'll resolve the ">" operator and then check every record that matches the ">" to see if it matches either of the two states.

This can be fixed by refactoring the WHERE like so:

WHERE (customer.AmountDue > SomeValue AND customer.state = "MI") OR 
      (customer.AmountDue > SomeValue AND customer.state = "WI").

with this structure the db engine has two AND phrases it can resolve to a smaller set of results, merge the two lists together, and the end result is a single set of records for the query to traverse. This is much faster than using the OR in the first part of your question.

It all comes down to the query matching an index on the table you're querying. If there's an index that exactly matches what you're looking for, it'll go a lot faster than if there's an index that partially matches your query, or if there's no matching index at all.

What you need to do is check out some of the excellent presentations given at the different PUG Challenge conferences. You can find a presentation on index selection given at PUG Challenge Americas here:

pugchallenge.org/downloads2015.html

You can find presentations given at PUG Challenge EMEA here in the "prior events" tab of http://pugchallenge.eu

Good luck!

Upvotes: 2

Related Questions