Reputation: 21
I'm attempting to determine the best way, either in SSIS or straight TSQL, to merge two rows based on a given key, but taking specific data from each row based on various aggregate rules (MAX and SUM specifically). As an example, given the following dataset:
Customer Name Total Date Outstanding
12345 A 100 7/15/2015 500
12345 200 1/1/2015 300
456 B 500 1/2/2010 100
456 B 250 2/1/2015 900
78 C 100 9/15/2015 500
I wish to consolidate those to a single row per customer key, with the following rules as an example:
The result set would be:
Customer Name Total Date Outstanding
12345 A 200 7/15/2015 800
456 B 500 2/1/2015 1000
78 C 100 9/15/2015 500
What's the best approach here? My first instinct is to query the table to join to itself on customer to get all values on a single row, and then use formulas in a Derived Column task in SSIS to determine the values to use. My concern there is that is not scalable - it works fine if I have a customer occur only twice in the main dataset, but the goal would be for the logic to work for N number of rows without needing to do a ton of rework. I'm sure there's also a TSQL approach that I'm missing here. Any help would be appreciated.
Upvotes: 2
Views: 603
Reputation: 2281
If name column in your query is not empty then you can do that simply by using aggregate function in one query
DECLARE @Customer TABLE
(
Customer INT, Name varchar(10), Total INT , PurchaseDate DATE , Outstanding INT
)
INSERT INTO @Customer
SELECT 12345,'A',100,'7/15/2015',500 UNION
SELECT 12345,'A',200,'1/1/2015',300 UNION
SELECT 456,'B',500,'1/2/2010',100 UNION
SELECT 456,'B',250,'2/1/2015',900 UNION
SELECT 78,'C',100,'9/15/2015',500
SELECT Customer,NAME ,MAX(Total), MAX(PurchaseDate), SUM(outstanding)
FROM @Customer
GROUP BY Customer, NAME
Now, if your name column is empty in few cases like you have mentioned in your example then you can update name table with correct name value
Upvotes: 1