Reputation: 2790
I have a function returning a person's id based on name, which is not unique. The arguments are last name, first name and organization; the firstname argument can be the name, its initial letter, or NULL. Here is the key query picking one of the namesakes (function init()
returns the first letter of a string followed by a fulstop):
SELECT p.id INTO _p FROM person p
WHERE p.lastname = _lastname AND ( _firstname IS NULL
OR inic(p.firstname) = _inic )
ORDER BY (p.firstname = _firstname) DESC, (p.organization = _org ) DESC,
( SELECT person_score( p.id ) ) DESC
LIMIT 1;
The person_score
function is quite expensive - it searches several tables for the person's past activity. The database is not full (just few thousand rows in the tested tables) and there are indexes on the important columns, but still calling the scoring function makes the person-returning function seven times slower. This wouldn't be a problem if it slowed only the queries where the ordering is really needed - complete namesakes working for the same organizations are rare. Unfortunately, EXPLAIN ANALYZE shows that the scoring function is called even if there is just one person with given last name.
Is there any way to ensure that the last condition gets evaluated only when needed without breaking the query into more queries? If not, how to make the split (and passing the data between them) as fast as possible?
One possible solution would be to rank the query over equivalence of the first names and organizations, store it in an array (instead of plain integer like now) and then run the scoring function only if the array is longer than 1 integer. However, this feels clumsy and I'm afraid that this wouldn't make the function much faster in ordinary situations (and slower for namesakes). I also have just a very rough idea how to do it and I don't want to start trying before I know that it's necessary.
Upvotes: 0
Views: 107
Reputation: 39004
When you need to use a rather time consuming calculation in a query, the optimal solution, if possible, is to have that value pre-calculated in a different table.
An usual case is when you have accounts with lots of entries and you need to query the balance frequently. Usually these systems keep the accounts balances in a different table, and keep it updated.
There are two possibilities with precalculated values:
they need to be exact, and be available in real-time: in this case you need to implement a solution that keeps them up to date. Ideally the application should be responsible for doing so, but on some occasions you cannot modify the application, for example because it's not yours, or the data comes from several different applications. In these cases you can resort to triggers
they don't need to be exact and available in real-time: you can still use some of the previous solutions, or you can make it in a simpler way: prepare a stored procedure or query that calculates the required data and run it whenever you need, for example manually, or as a recurent job (ideally at low workload times, if it's too time consuming), or trigeered by some kind of event.
In your particular case you'd need a table to keep the user's scores, and implement any of this solutions to update it as required.
Upvotes: 2
Reputation: 94884
So, when there is more than one result record and you order by (p.firstname = _firstname) DESC, (p.organization = _org ) DESC
and you still get duplicates, then you want to apply that function for the first ranked records. So use an acccording CASE constuct with window functions in your ORDER BY.
SELECT p.id INTO _p
FROM person p
WHERE p.lastname = _lastname
AND ( _firstname IS NULL OR inic(p.firstname) = _inic)
ORDER BY (p.firstname = _firstname) DESC, (p.organization = _org) DESC,
CASE WHEN COUNT(*) OVER() > 1 THEN
CASE WHEN COUNT(*) OVER(PARTITION BY (p.firstname = _firstname), (p.organization = _org)) > 1 THEN
CASE WHEN RANK() OVER(ORDER BY (p.firstname = _firstname) DESC, (p.organization = _org) DESC) = 1 THEN
person_score(p.id)
END
END
END
LIMIT 1;
This should save you some unnecessary function calls. However, the dbms has to do some aggregations in order to find out whether a call must be done or not. So this may be faster or not. Just try.
Upvotes: 2