I have a large dataset where we keep track of all revenue collected by client for each month. So each client gets a row for each month they were charged, resulting in a lot of rows. I am trying to find a way to pull the most recent record for each client. Here is what i have: merchant name Merchant id revenue date revenue amount fish 1234 2022-03-01 200 fish 1234 2022-04-01 200 fish 1234 2022-05-01 200 fish 1234 2022-06-01 200 dog 5678 2022-01-01 200 dog 5678 2022-02-01 200 dog 5678 2022-03-01 200 dog 5678 2022-04-01 200 cat 1011 2022-10-01 200 cat 1011 2022-11-01 200 My desired result is: merchant name Merchant id revenue date revenue amount fish 1234 2022-06-01 200 dog 5678 2022-04-01 200 cat 1011 2022-11-01 200 I have tried this: Select distinct merchant_name, merchant_id, revenue_date, revenue_amount from table where revenue_date=(select max(revenue_date) from table) but that is only returning rows that match the maximum date listed (2022-11-01). Any help is much appreciated!

Reputation: 13

How to select the most recent records in a dataset in sql

I have a large dataset where we keep track of all revenue collected by client for each month. So each client gets a row for each month they were charged, resulting in a lot of rows. I am trying to find a way to pull the most recent record for each client.

Here is what i have:

merchant name	Merchant id	revenue date	revenue amount
fish	1234	2022-03-01	200
fish	1234	2022-04-01	200
fish	1234	2022-05-01	200
fish	1234	2022-06-01	200
dog	5678	2022-01-01	200
dog	5678	2022-02-01	200
dog	5678	2022-03-01	200
dog	5678	2022-04-01	200
cat	1011	2022-10-01	200
cat	1011	2022-11-01	200

My desired result is:

merchant name	Merchant id	revenue date	revenue amount
fish	1234	2022-06-01	200
dog	5678	2022-04-01	200
cat	1011	2022-11-01	200

I have tried this:

Select distinct
merchant_name,
merchant_id,
revenue_date,
revenue_amount
from table
where revenue_date=(select max(revenue_date) from table)

but that is only returning rows that match the maximum date listed (2022-11-01).

Any help is much appreciated!

Upvotes: 0

Answers (2)

dc-ddfe

Reputation: 495

Alternate answer (formatted for SQL Server):

--For each merchantID, find the latest date
WITH maxDates AS (
    SELECT 
    merchantID,
    MAX(revenueDate) revenueDate
    FROM @input
    GROUP BY merchantID
)

--Use the defined CTE to retrieve the rest of the columns
SELECT i.* 
FROM @input i 
INNER JOIN maxDates m ON i.merchantID=m.merchantID AND i.revenueDate=m.revenueDate

Upvotes: 0

The Impaler

Reputation: 48875

The general solution takes the form:

select *
from (
  select t.*, row_number() over(partition by merchant_name 
                                order by revenue_date desc) as rn
  from t
) x
where rn = 1

Or... in PostgreSQL you can just do:

select distinct on (merchant_name) *
from t
order by merchant_name, revenue_date desc

Upvotes: 1

How to select the most recent records in a dataset in sql

Answers (2)

Related Questions