Reputation: 102
SQL server 2012.
Edit: my original query was more complex than it should be, since I tried to do a Distinct query on a subset of the fields in the table and join that on the table itself to get the other (text) fields. the following query does the trick as well:
SELECT DISTINCT
p1.id
,p1.Name
,CAST( p1.[Description] AS nvarchar(max)) AS Description
,( SELECT [Category] + ', '
FROM [dbo].[Company] AS p2
WHERE p2.Id = p1.Id
ORDER BY Name
FOR XML PATH('') ) AS Categories
FROM [dbo].[Company] AS p1
ORDER BY p1.Id
I have a table with data similar to this (multiple records per company that are identical except for the category field):
+----+------+-----------------+----------+
| Id | Name | Description | Category |
+----+------+-----------------+----------+
| 1 | AAA | <loads of text> | cat1 |
| 1 | AAA | <loads of text> | cat2 |
| 2 | BBB | <even more text>| cat1 |
| 2 | BBB | <even more text>| cat3 |
+----+------+-----------------+----------+
I'm trying to make a query to get this result (unique records per company and categories aggregated into 1 field):
| 1 | AAA | <loads of text> | cat1, cat2 |
| 2 | BBB | <even more text>| cat1, cat3 |
Using information from various topics on SO I've come up with this:
SELECT
t1.Id
,t2.Name
,t2.[Description]
,t1.Category
FROM [dbo].[Company] AS t2
INNER JOIN (SELECT DISTINCT p1.Id
,( SELECT [Category] + ', '
FROM [dbo].[Company] AS p2
WHERE p2.Id = p1.Id
ORDER BY Name
FOR XML PATH('') ) AS Category
FROM [dbo].[Company] AS p1
) AS t1 ON t1.Id = t2.Id
ORDER BY t1.Id
The query result has a record for every record in the Company table, with the categories aggregated into the category field:
+----+------+-----------------+------------+
| Id | Name | Description | Category |
+----+------+-----------------+------------+
| 1 | AAA | <loads of text> | cat1, cat2 |
| 1 | AAA | <loads of text> | cat1, cat2 |
| 2 | BBB | <even more text>| cat1, cat3 |
| 2 | BBB | <even more text>| cat1, cat3 |
+----+------+-----------------+------------+
I thought INNER JOIN would only select rows if both tables have a match. The subquery produces by itself the expected result (one record per Id with the categories aggregated). I tried another group by clause over the entire query, but that fails because I can't include the Description field in the group clause since it's a text type field.
What am I missing?
Upvotes: 1
Views: 5313
Reputation: 5745
SELECT *
FROM ( SELECT t1.Id ,
t2.Name ,
t2.[Description] ,
t1.Category ,
ROW_NUMBER() OVER ( PARTITION BY t1.Id, t2.Name,
t1.Category ORDER BY t1.id ) row_num
FROM [dbo].[Company] AS t2
INNER JOIN ( SELECT DISTINCT
p1.Id ,
( SELECT [Category] + ', '
FROM [dbo].[Company] AS p2
WHERE p2.Id = p1.Id
ORDER BY Name
FOR
XML PATH('')
) AS Category
FROM [dbo].[Company] AS p1
) AS t1 ON t1.Id = t2.Id
) t1
ORDER BY t1.Id
Upvotes: 1
Reputation: 96552
I would try a DISTINCT in the outer query. This should solve your problem unless the Descriptions/Names are different for some of the rows which may be entirely possible since your database table should probably have been two tables and likely you never wrote any code to make sure description/name stayed the same for every ID. If you have a compound unique index on id, name and description, then you are likely ok.
If you do have the problem of multiple descriptions you will need to use an aggregate to fix the problem in the outer query. Or you will need to fix the data and add the unique index to prevent a future occurance.
As far as why you are having this problem, the join is working just fine but what you have is a one-many relationship between the derived table and the other table. That is why you are getting multiple records and why distinct should fix it unless the data for an id is different for name and description.
Try this:
SELECT DISTINCT
t1.Id
,t2.Name
,cast(t2.[Description] as nvarchar(max))
,t1.Category
FROM [dbo].[Company] AS t2
INNER JOIN (SELECT DISTINCT p1.Id
,( SELECT [Category] + ', '
FROM [dbo].[Company] AS p2
WHERE p2.Id = p1.Id
ORDER BY Name
FOR XML PATH('') ) AS Category
FROM [dbo].[Company] AS p1
) AS t1 ON t1.Id = t2.Id
ORDER BY t1.Id
Alternatively you can fix your bad table design.
Upvotes: 2
Reputation: 33571
To get around the nasty issue of using text as a datatype you can cast that column when you pull it. If at all possible, I would change the column permanently to varchar(max).
Something like this:
SELECT
t1.Id
, t2.Name
, cast(t2.[Description] as varchar(max)) as Description
, t1.Category
FROM [dbo].[Company] AS t2
INNER JOIN (SELECT DISTINCT p1.Id
,( SELECT [Category] + ', '
FROM [dbo].[Company] AS p2
WHERE p2.Id = p1.Id
ORDER BY Name
FOR XML PATH('') ) AS Category
FROM [dbo].[Company] AS p1
) AS t1 ON t1.Id = t2.Id
GROUP BY Id
, Name
, cast(t2.[Description] as varchar(max))
ORDER BY t1.Id
Upvotes: 1