ira
ira

Reputation: 755

Select people with latest balance for one credit card being greater than for another

In a PostgreSQL 9.5.3 DB, I have a credit_card_balances table referencing a persons table which tracks the balances of various credit cards associated to a particular person:

CREATE TABLE persons (
  id serial PRIMARY KEY,
  name text
);

CREATE credit_card_balances (
  id serial PRIMARY KEY,
  card_provider text, 
  person int REFERENCES persons,
  balance decimal, 
  timestamp timestamp
);

Example row for credit_card_balances:

id  |  card_provider | person  | balance | timestamp
123 |  visa          | 1234    | 1.00    | 16-07-26 17:00

I need to retrieve the collection of people who have both a 'visa' and an 'amex' card, such that the most recent balance on the 'visa' card is larger than the most recent balance on the 'amex' card.

For each (person, card_provider) there may be up to around 100 rows in the table. Ideally, the output columns would be:

person, provider1_balance, provider2_balance, provider1_timestamp, provider2_timestamp

I know that I can do something like

SELECT DISTINCT ON (card_provider) *
FROM credit_card_balances 
WHERE person=1234
ORDER BY card_provider, timestamp DESC;

to get the most recent balances for each card for a particular person. But I'm not sure how to do that over all people and verify the conditions above, or if this is even the correct approach.

Edit: AS partially suggested in an answer, I can also do something like

SELECT * from credit_card_balances b1, credit_card_balances b2
WHERE b1.person = b2.person
AND (b1.card_provider = 'amex' 
     AND b1.timestamp in
        (SELECT MAX(time_stamp) 
         FROM credit_card_balances 
         WHERE card_provider = 'amex'))

AND (b2.card_provider = 'visa'
     AND <... same as above>)
AND b1.balance > b2.balance;

But I noticed that this leads to horrible performance. So I think this isn't such a good option.

Upvotes: 4

Views: 303

Answers (3)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657082

This problem is a combination of two classics: and .

Given your updated specifications and with up to around 100 rows per (person, card_provider), I would expect this query to be substantially faster than what we have so far:

SELECT a.person
     , a.balance   AS amex_balance
     , v.balance   AS visa_balance
     , a.timestamp AS amex_timestamp
     , v.timestamp AS visa_timestamp
FROM   persons p
CROSS  JOIN LATERAL (
   SELECT balance, timestamp
   FROM   credit_card_balances 
   WHERE  person = p.id
   AND    card_provider = 'amex'  -- more selective credit card first to optimize
   ORDER  BY timestamp DESC
   LIMIT  1
   ) a
JOIN   LATERAL (
   SELECT balance, timestamp
   FROM   credit_card_balances 
   WHERE  person = p.id
   AND    card_provider = 'visa'  -- 2nd cc
   ORDER  BY timestamp DESC
   LIMIT  1
   ) v ON v.balance > a.balance;

Index support is crucial. This would be ideal for the case:

CREATE INDEX ON credit_card_balances (person, card_provider, timestamp DESC, balance);

Adding balance as last index column only makes sense if you get index-only scans out of it.

This is assuming that timestamp is defined NOT NULL, else you may need to add need NULLS LAST to query and index.

Related:


For only few rows per (person, card_provider) an approach with DISTINCT ON might be faster. A separate persons table wouldn't help. The sweet spot depends on many factors.

Assuming at least several different credit cards.

DISTINCT ON for one credit card, a LATERAL subquery for the other:

SELECT a.person
     , a.balance   AS amex_balance
     , v.balance   AS visa_balance
     , a.timestamp AS amex_timestamp
     , v.timestamp AS visa_timestamp
FROM  (
   SELECT DISTINCT ON (person)
          person, balance, timestamp
   FROM   credit_card_balances 
   WHERE  card_provider = 'amex'  -- the more selective credit card first
   ORDER  BY person, timestamp DESC
   ) a
JOIN  LATERAL (
   SELECT balance, timestamp
   FROM   credit_card_balances 
   WHERE  card_provider = 'visa'
   AND    person = a.person
   ORDER  BY timestamp DESC
   LIMIT  1
   ) v ON v.balance > a.balance

DISTINCT ON for each credit card, then join:

SELECT a.person
     , a.balance   AS amex_balance
     , v.balance   AS visa_balance
     , a.timestamp AS amex_timestamp
     , v.timestamp AS visa_timestamp
FROM  (
   SELECT DISTINCT ON (person)
          person, balance, timestamp
   FROM   credit_card_balances 
   WHERE  card_provider = 'amex'
   ORDER  BY person, timestamp DESC
   ) a
JOIN  (
   SELECT DISTINCT ON (person)
          person, balance, timestamp
   FROM   credit_card_balances 
   WHERE  card_provider = 'visa'
   ORDER  BY person, timestamp DESC
   ) v USING (person)
WHERE  v.balance > a.balance;

Or, my favorite: one DISTINCT ON for both credit cards, then filtered aggregate with a HAVING condition:

SELECT person
     , max(balance)   FILTER (WHERE card_provider = 'amex') AS amex_balance
     , max(balance)   FILTER (WHERE card_provider = 'visa') AS visa_balance
     , max(timestamp) FILTER (WHERE card_provider = 'amex') AS amex_timestamp
     , max(timestamp) FILTER (WHERE card_provider = 'visa') AS visa_timestamp
FROM  (
   SELECT DISTINCT ON (person, card_provider)
          person, card_provider, balance, timestamp
   FROM   credit_card_balances 
   WHERE  card_provider IN ('amex', 'visa')
   ORDER  BY person, card_provider, timestamp DESC
   ) c
GROUP  BY person
HAVING max(balance) FILTER (WHERE card_provider = 'visa')
     > max(balance) FILTER (WHERE card_provider = 'amex');

The aggregate FILTER clause requires Postgres 9.4+:

Upvotes: 3

Taleh Ibrahimli
Taleh Ibrahimli

Reputation: 759

You can do it with helf of nested select and window function

select * from (
     select *, 
       rank() over(partition by card_provider order by balance desc) as rank 
     from credit_card_balances
) credit_card_balances_ranked
where rank = 1

Upvotes: 0

alzee
alzee

Reputation: 1396

Use a self join. Something like:

SELECT * from credit_card_balances b1, credit_card_balances b2
WHERE b1.person = b2.person
  AND b1.card_provider = 'amex'
  AND b2.card_provider = 'visa'
  AND b1.balance > b2.balance;

Combining this with more or less what you already came up with results in the following, using a view to keep the query easier to understand.

CREATE VIEW most_recent_balance AS
  SELECT DISTINCT ON (person, card_provider) *
    FROM credit_card_balances 
   GROUP BY id, person
   ORDER BY person, card_provider, timestamp DESC;

Substitute this most_recent_balance view in place of the table in the self join query.

Upvotes: 0

Related Questions