In a PostgreSQL 9.5.3 DB, I have a credit_card_balances table referencing a persons table which tracks the balances of various credit cards associated to a particular person: CREATE TABLE persons ( id serial PRIMARY KEY, name text ); CREATE credit_card_balances ( id serial PRIMARY KEY, card_provider text, person int REFERENCES persons, balance decimal, timestamp timestamp ); Example row for credit_card_balances : id | card_provider | person | balance | timestamp 123 | visa | 1234 | 1.00 | 16-07-26 17:00 I need to retrieve the collection of people who have both a 'visa' and an 'amex' card, such that the most recent balance on the 'visa' card is larger than the most recent balance on the 'amex' card. For each (person, card_provider) there may be up to around 100 rows in the table. Ideally, the output columns would be: person, provider1_balance, provider2_balance, provider1_timestamp, provider2_timestamp I know that I can do something like SELECT DISTINCT ON (card_provider) * FROM credit_card_balances WHERE person=1234 ORDER BY card_provider, timestamp DESC; to get the most recent balances for each card for a particular person. But I'm not sure how to do that over all people and verify the conditions above, or if this is even the correct approach. Edit: AS partially suggested in an answer, I can also do something like SELECT * from credit_card_balances b1, credit_card_balances b2 WHERE b1.person = b2.person AND (b1.card_provider = 'amex' AND b1.timestamp in (SELECT MAX(time_stamp) FROM credit_card_balances WHERE card_provider = 'amex')) AND (b2.card_provider = 'visa' AND <... same as above>) AND b1.balance > b2.balance; But I noticed that this leads to horrible performance. So I think this isn't such a good option.

Select people with latest balance for one credit card being greater than for another

Reputation: 657082

This problem is a combination of two classics: greatest-n-per-group and relational-division.

Given your updated specifications and with up to around 100 rows per (person, card_provider), I would expect this query to be substantially faster than what we have so far:

SELECT a.person
     , a.balance   AS amex_balance
     , v.balance   AS visa_balance
     , a.timestamp AS amex_timestamp
     , v.timestamp AS visa_timestamp
FROM   persons p
CROSS  JOIN LATERAL (
   SELECT balance, timestamp
   FROM   credit_card_balances 
   WHERE  person = p.id
   AND    card_provider = 'amex'  -- more selective credit card first to optimize
   ORDER  BY timestamp DESC
   LIMIT  1
   ) a
JOIN   LATERAL (
   SELECT balance, timestamp
   FROM   credit_card_balances 
   WHERE  person = p.id
   AND    card_provider = 'visa'  -- 2nd cc
   ORDER  BY timestamp DESC
   LIMIT  1
   ) v ON v.balance > a.balance;

Index support is crucial. This would be ideal for the case:

CREATE INDEX ON credit_card_balances (person, card_provider, timestamp DESC, balance);

Adding balance as last index column only makes sense if you get index-only scans out of it.

This is assuming that timestamp is defined NOT NULL, else you may need to add need NULLS LAST to query and index.

Assuming at least several different credit cards.

DISTINCT ON for one credit card, a LATERAL subquery for the other:

SELECT a.person
     , a.balance   AS amex_balance
     , v.balance   AS visa_balance
     , a.timestamp AS amex_timestamp
     , v.timestamp AS visa_timestamp
FROM  (
   SELECT DISTINCT ON (person)
          person, balance, timestamp
   FROM   credit_card_balances 
   WHERE  card_provider = 'amex'  -- the more selective credit card first
   ORDER  BY person, timestamp DESC
   ) a
JOIN  LATERAL (
   SELECT balance, timestamp
   FROM   credit_card_balances 
   WHERE  card_provider = 'visa'
   AND    person = a.person
   ORDER  BY timestamp DESC
   LIMIT  1
   ) v ON v.balance > a.balance

DISTINCT ON for each credit card, then join:

SELECT a.person
     , a.balance   AS amex_balance
     , v.balance   AS visa_balance
     , a.timestamp AS amex_timestamp
     , v.timestamp AS visa_timestamp
FROM  (
   SELECT DISTINCT ON (person)
          person, balance, timestamp
   FROM   credit_card_balances 
   WHERE  card_provider = 'amex'
   ORDER  BY person, timestamp DESC
   ) a
JOIN  (
   SELECT DISTINCT ON (person)
          person, balance, timestamp
   FROM   credit_card_balances 
   WHERE  card_provider = 'visa'
   ORDER  BY person, timestamp DESC
   ) v USING (person)
WHERE  v.balance > a.balance;

Or, my favorite: one DISTINCT ON for both credit cards, then filtered aggregate with a HAVING condition:

SELECT person
     , max(balance)   FILTER (WHERE card_provider = 'amex') AS amex_balance
     , max(balance)   FILTER (WHERE card_provider = 'visa') AS visa_balance
     , max(timestamp) FILTER (WHERE card_provider = 'amex') AS amex_timestamp
     , max(timestamp) FILTER (WHERE card_provider = 'visa') AS visa_timestamp
FROM  (
   SELECT DISTINCT ON (person, card_provider)
          person, card_provider, balance, timestamp
   FROM   credit_card_balances 
   WHERE  card_provider IN ('amex', 'visa')
   ORDER  BY person, card_provider, timestamp DESC
   ) c
GROUP  BY person
HAVING max(balance) FILTER (WHERE card_provider = 'visa')
     > max(balance) FILTER (WHERE card_provider = 'amex');

The aggregate FILTER clause requires Postgres 9.4+:

Upvotes: 3

Taleh Ibrahimli

Reputation: 759

You can do it with helf of nested select and window function

select * from (
     select *, 
       rank() over(partition by card_provider order by balance desc) as rank 
     from credit_card_balances
) credit_card_balances_ranked
where rank = 1

Upvotes: 0

alzee

Reputation: 1396

Use a self join. Something like:

SELECT * from credit_card_balances b1, credit_card_balances b2
WHERE b1.person = b2.person
  AND b1.card_provider = 'amex'
  AND b2.card_provider = 'visa'
  AND b1.balance > b2.balance;

Combining this with more or less what you already came up with results in the following, using a view to keep the query easier to understand.

CREATE VIEW most_recent_balance AS
  SELECT DISTINCT ON (person, card_provider) *
    FROM credit_card_balances 
   GROUP BY id, person
   ORDER BY person, card_provider, timestamp DESC;

Substitute this most_recent_balance view in place of the table in the self join query.

Upvotes: 0

Select people with latest balance for one credit card being greater than for another

Answers (3)

Related Questions