Reputation: 21
OK, I'm an advanced newbie at SQL, with enough knowledge to be dangerous.
I have pulled a set of data with a main query, and I need to exclude a subset of records from it. The main query:
`SELECT
[FundID]
,[AssetID]
,[TransactionTypeCode]
,[TradeDate]
,[NetAmountBase]
,[BrokerNumber]
,[TradeID]
,[TradeIDCancel]
FROM [DailyTradeTransactions] WITH (NOLOCK)
WHERE TransactionTypeCode IN ('SELL','CSELL')
AND TradeDate BETWEEN @StartDate AND @EndDate
AND BrokerNumber NOT IN ('12345','23456','34567','45678','56789','67891')`
This returns 382 records. I now have to eliminate all the cancelled sales, so I have a query that returns all the sales that were initiated (identified by 'SELL') and then cancelled by a reversing transaction (identified by 'CSELL'):
``SELECT TradeID ,TradeIDCancel
FROM [DailyTradeTransactions] WITH (NOLOCK)
WHERE TransactionTypeCode IN ('SELL','CSELL')
AND TradeIDCancel Is Not Null
AND TradeDate BETWEEN '4/1/2023' AND '4/7/2023'
AND BrokerNumber NOT IN ('12345','23456','34567','45678','56789','67891')``
This produces a table of 22 records, which consists of 22 TradeIDs with the corresponding 22 TradeIDCancels, which identify the 44 records that need to be excluded. The variables in the TradeIDCancel appear in the TradeID column, since the TradeIDCancel column just identifies which record reversed the corresponding TradeID. That's why I can't just exclude records with "TradeIDCancel Is Not Null" because it'll return the original cancelled trades as if they weren't cancelled.
I tried making the second query a subquery and using NOT EXISTS, but that returns no records, because every transaction has a TradeID.
It's embarrassing that I can't figure it out, but how can I exclude these 44 records?
I'm expecting 338 records, all with NULL in TradeIDCancel.
The 44 that need to be eliminated will consist of 22 records with TradeIDs and NULL in the corresponding TradeIDCancel column, and 22 records with their own TradeIDs, and the TradeIDs of the first 22 records in the corresponding TradeIDCancel column.
All help is greatly appreciated! `
Upvotes: 2
Views: 223
Reputation: 548
There are at least two ways to accomplish filter out based on 2 columns:
Based on your requirements, it looks like you actually only need one column, in which case you can also use:
Here's a script that creates a table for testing and loads sample data.
-- INIT database - Don't run this in production. Maybe use a temp table instead.
CREATE TABLE DailyTradeTransactions (
[FundID] int
,[AssetID] int
,[TransactionTypeCode] varchar(20)
,[TradeDate] datetime2
,[NetAmountBase] money
,[BrokerNumber] varchar(20)
,[TradeID] int Primary Key
,[TradeIDCancel] int
);
--Load Test Data
INSERT INTO DailyTradeTransactions
VALUES
(1,1,'SELL' ,'2024-01-20 01:00',10.12,'78912',0,null)
,(1,1,'SELL' ,'2024-01-20 01:01',10.12,'12345',1,null)
,(1,1,'SELL' ,'2024-01-20 01:02',10.12,'78912',2,null)
,(1,1,'CSELL' ,'2024-01-20 01:03',10.12,'12345',3,2)
,(1,1,'SELL','2024-01-20 01:04',10.12,'78912',4,null)
,(1,1,'SELL','2024-01-20 01:05',10.12,'12345',5,null)
,(1,1,'CSELL','2024-01-20 01:06',10.12,'78912',6,5)
,(1,1,'SELL','2024-01-20 01:07',10.12,'78912',7,null)
,(1,1,'SELL','2024-01-20 01:08',10.12,'78912',8,null)
Finally a query to return the expected results.
--Prepare Variables
DECLARE @StartDate datetime2 = '2024-01-20'
,@EndDate datetime2 = '2024-01-23'
--Filter out the cancelled results with "Not Exists"
SELECT
[FundID]
,[AssetID]
,[TransactionTypeCode]
,[TradeDate]
,[NetAmountBase]
,[BrokerNumber]
,[TradeID]
,[TradeIDCancel]
FROM [DailyTradeTransactions] t
WHERE TransactionTypeCode IN ('SELL','CSELL')
AND TradeDate BETWEEN @StartDate AND @EndDate
AND BrokerNumber NOT IN ('12345','23456','34567','45678','56789','67891')
AND TradeIDCancel IS NULL
AND NOT EXISTS (SELECT 1 FROM DailyTradeTransactions ti WHERE t.TradeID = ti.TradeIDCancel)
I loaded this into a SQLFiddle so that you can test using the same test data: https://sqlfiddle.com/sql-server/online-compiler?id=a53bb4e4-c7b7-417b-b138-1631f35056a0
Alternatively you could you a left join instead of a "not exists":
--Filter out the cancelled results with a left join
SELECT
t.[FundID]
,t.[AssetID]
,t.[TransactionTypeCode]
,t.[TradeDate]
,t.[NetAmountBase]
,t.[BrokerNumber]
,t.[TradeID]
,t.[TradeIDCancel]
FROM [DailyTradeTransactions] t
LEFT JOIN [DailyTradeTransactions] tc on tc.TradeIDCancel = t.TradeID
WHERE t.TransactionTypeCode IN ('SELL','CSELL')
AND t.TradeDate BETWEEN @StartDate AND @EndDate
AND t.BrokerNumber NOT IN ('12345','23456','34567','45678','56789','67891')
AND t.TradeIDCancel IS NULL
AND tc.TradeID IS NULL
Upvotes: 0
Reputation: 425418
You need to refer to the subquery twice, so create a CTE to do that without executing it twice, plus make the whole query more readable:
WITH cancelled AS (
SELECT TradeID ,TradeIDCancel
FROM [DailyTradeTransactions] WITH (NOLOCK)
WHERE TransactionTypeCode IN ('SELL','CSELL')
AND TradeIDCancel Is Not Null
AND TradeDate BETWEEN '4/1/2023' AND '4/7/2023'
AND BrokerNumber NOT IN ('12345','23456','34567','45678','56789','67891')
)
SELECT
[FundID]
,[AssetID]
,[TransactionTypeCode]
,[TradeDate]
,[NetAmountBase]
,[BrokerNumber]
,[TradeID]
,[TradeIDCancel]
FROM [DailyTradeTransactions] WITH (NOLOCK)
WHERE TransactionTypeCode IN ('SELL','CSELL')
AND TradeDate BETWEEN @StartDate AND @EndDate
AND BrokerNumber NOT IN ('12345','23456','34567','45678','56789','67891')`
AND TradeID NOT IN (SELECT TradeID FROM cancelled)
AND TradeIDCancel NOT IN (SELECT TradeIDCancel FROM cancelled)
Upvotes: 0