Reputation: 45
i'm trying to come up with a way to combine rows in a table based on the longest string in any of the rows based on a row key
example
CREATE TABLE test1
(akey int not null ,
text1 varchar(50) NULL,
text2 varchar(50) NULL,
text3 varchar(50) NULL )
INSERT INTO test1 VALUES ( 1,'Winchester Road','crawley',NULL)
INSERT INTO test1 VALUES ( 1,'Winchester Rd','crawley','P21869')
INSERT INTO test1 VALUES ( 1,'Winchester Road','crawley estate','P21869')
INSERT INTO test1 VALUES ( 1,'Winchester Rd','crawley','P21869A')
INSERT INTO test1 VALUES ( 2,'','birmingham','P53342B')
INSERT INTO test1 VALUES ( 2,'Smith Close','birmingham North East','P53342')
INSERT INTO test1 VALUES ( 2,'Smith Cl.',NULL,'P53342B')
INSERT INTO test1 VALUES ( 2,'Smith Close','birmingham North','P53342')
with these rows i would be looking for the result of :
1 Winchester Road, crawley estate, P21869A
2 Smith Close, birmingham North East, P53342B
EDIT: the results above need to be in a table rather than just a comma separated string
as you can see in the result, the output should be the longest text column in the range of the 'akey' field.
i'm trying to come up with a solution that does not involve lots of subqueries on each column, the actual table has 32 columns and over 13 million rows.
the reason i'm doing this is to create a cleaned-up table that has the best results in each column for just one ID per row
this is my first post, so let me know if you need any more info, and i'm happy to hear about any best practices about posting that i've broken!
thanks
Ben.
Upvotes: 4
Views: 2567
Reputation: 161831
SELECT A.akey,
(
SELECT TOP 1 T1.text1
FROM test1 T1
WHERE T1.akey=A.akey AND LEN(T1.TEXT1) = MAX(LEN(A.text1))
) AS TEXT1,
(
SELECT TOP 1 T2.text2
FROM test1 T2
WHERE T2.akey=A.akey AND LEN(T2.TEXT2) = MAX(LEN(A.text2))
) AS TEXT2,
(
SELECT TOP 1 T3.text3
FROM test1 T3
WHERE T3.akey=A.akey AND LEN(T3.TEXT3) = MAX(LEN(A.text3))
) AS TEXT3
FROM TEST1 AS A
GROUP BY A.akey
I just realized you said you have 32 columns. I don't see a good way to do that, unless UNPIVOT would allow you to create separate rows (akey, textn) for each text* column.
Edit: I may not have a chance to finish this today, but UNPIVOT looks useful:
;
WITH COLUMNS AS
(
SELECT akey, [Column], ColumnValue
FROM
(
SELECT X.Akey, X.Text1, X.Text2, X.Text3
FROM test1 X
) AS p
UNPIVOT (ColumnValue FOR [Column] IN (Text1, Text2, Text3))
AS UNPVT
)
SELECT *
FROM COLUMNS
ORDER BY akey,[Column], LEN(ColumnValue)
Upvotes: 2
Reputation: 738
This seems really ugly, but at least works (on SQL2K) and doesn't need subqueries:
select test1.akey, A.text1, B.text2, C.text3
from test1
inner join test1 A on A.akey = test1.akey
inner join test1 B on B.akey = test1.akey
inner join test1 C on C.akey = test1.akey
group by test1.akey, A.text1, B.text2, C.text3
having len(a.text1) = max(len(test1.text1))
and len(B.text2) = max(len(test1.text2))
and len(C.text3) = max(len(test1.text3))
order by test1.akey
I must admit that it needs an inner join for each column and I wonder how this could impact on the 32 columns x 13millions record table... I try both this approach and the one based one subqueries and looked at executions plans: I'ld actually be curious to know
Upvotes: 1