Billy ONeal
Billy ONeal

Reputation: 106530

How can I condense strings in several string rows into a single field?

For a class project, a few others and I have decided to make a (very ugly) limited clone of StackOverflow. For this purpose, we're working on one query:

Home Page: List all the questions, their scores (calculated from votes), and the user corresponding to their first revision, and the number of answers, sorted in date-descending order according to the last action on the question (where an action is an answer, an edit of an answer, or an edit of the question).

Now, we've gotten the entire thing figured out, except for how to represent tags on questions. We're currently using a M-N mapping of tags to questions like this:

CREATE TABLE QuestionRevisions (
id INT IDENTITY NOT NULL,
question INT NOT NULL,
postDate DATETIME NOT NULL,
contents NTEXT NOT NULL,
creatingUser INT NOT NULL,
title NVARCHAR(200) NOT NULL,
PRIMARY KEY (id),
CONSTRAINT questionrev_fk_users FOREIGN KEY (creatingUser) REFERENCES
Users (id) ON DELETE CASCADE,
CONSTRAINT questionref_fk_questions FOREIGN KEY (question) REFERENCES
Questions (id) ON DELETE CASCADE
);

CREATE TABLE Tags (
id INT IDENTITY NOT NULL,
name NVARCHAR(45) NOT NULL,
PRIMARY KEY (id)
);

CREATE TABLE QuestionTags (
tag INT NOT NULL,
question INT NOT NULL,
PRIMARY KEY (tag, question),
CONSTRAINT qtags_fk_tags FOREIGN KEY (tag) REFERENCES Tags(id) ON
DELETE CASCADE,
CONSTRAINT qtags_fk_q FOREIGN KEY (question) REFERENCES Questions(id) ON
DELETE CASCADE
);

Now, for this query, if we just join to QuestionTags, then we'll get the questions and titles over and over and over again. If we don't, then we have an N query scenario, which is just as bad. Ideally, we'd have something where the result row would be:

+-------------+------------------+
| Other Stuff | Tags             |
+-------------+------------------+
| Blah Blah   | TagA, TagB, TagC |
+-------------+------------------+

Basically -- for each row in the JOIN, do a string join on the resulting tags.

Is there a built in function or similar which can accomplish this in T-SQL?

Upvotes: 2

Views: 416

Answers (3)

Mack
Mack

Reputation: 2552

Here's one possible solution using recursive CTE:

The methods used are explained here

TSQL to set up the test data (I'm using table variables):

DECLARE @QuestionRevisions TABLE  ( 
id INT IDENTITY NOT NULL, 
question INT NOT NULL, 
postDate DATETIME NOT NULL, 
contents NTEXT NOT NULL, 
creatingUser INT NOT NULL, 
title NVARCHAR(200) NOT NULL)

DECLARE @Tags TABLE ( 
id INT IDENTITY NOT NULL, 
name NVARCHAR(45) NOT NULL
)

DECLARE @QuestionTags TABLE ( 
tag INT NOT NULL, 
question INT NOT NULL
)
INSERT INTO @QuestionRevisions 
(question,postDate,contents,creatingUser,title)
VALUES
(1,GETDATE(),'Contents 1',1,'TITLE 1')

INSERT INTO @QuestionRevisions 
(question,postDate,contents,creatingUser,title)
VALUES
(2,GETDATE(),'Contents 2',2,'TITLE 2')

INSERT INTO @Tags (name) VALUES ('Tag 1')
INSERT INTO @Tags (name) VALUES ('Tag 2')
INSERT INTO @Tags (name) VALUES ('Tag 3')
INSERT INTO @Tags (name) VALUES ('Tag 4')
INSERT INTO @Tags (name) VALUES ('Tag 5')
INSERT INTO @Tags (name) VALUES ('Tag 6')

INSERT INTO @QuestionTags (tag,question) VALUES (1,1)
INSERT INTO @QuestionTags (tag,question) VALUES (3,1)
INSERT INTO @QuestionTags (tag,question) VALUES (5,1)
INSERT INTO @QuestionTags (tag,question) VALUES (4,2)
INSERT INTO @QuestionTags (tag,question) VALUES (2,2)

Here's the action part:

;WITH CTE ( id, taglist, tagid, [length] ) 
      AS (  SELECT question, CAST( '' AS VARCHAR(8000) ), 0, 0
            FROM @QuestionRevisions qr
            GROUP BY question
            UNION ALL
            SELECT qr.id
                ,  CAST(taglist + CASE WHEN [length] = 0 THEN '' ELSE ', ' END + t.name AS VARCHAR(8000) )
                ,  t.id
                ,  [length] + 1
            FROM CTE c 
            INNER JOIN @QuestionRevisions qr ON c.id = qr.question
            INNER JOIN @QuestionTags qt ON qr.question=qt.question
            INNER JOIN @Tags t ON t.id=qt.tag
            WHERE t.id > c.tagid )
SELECT id, taglist 
FROM ( SELECT id, taglist, RANK() OVER ( PARTITION BY id ORDER BY length DESC )
         FROM CTE ) D ( id, taglist, rank )
WHERE rank = 1;

Upvotes: 2

Bert
Bert

Reputation: 82459

This is a common question that comes up quite often phrased in a number of different ways (concatenate rows as string, merge rows as string, condense rows as string, combine rows as string, etc.). There are two generally accepted ways to handle combining an arbitrary number of rows into a single string in SQL Server.

The first, and usually the easiest, is to abuse XML Path combined with the STUFF function like so:

select rsQuestions.QuestionID,
       stuff((select ', '+ rsTags.TagName
              from @Tags rsTags  
              inner join @QuestionTags rsMap on rsMap.TagID = rsTags.TagID
              where rsMap.QuestionID = rsQuestions.QuestionID
              for xml path(''), type).value('.', 'nvarchar(max)'), 1, 1, '')
from @QuestionRevisions rsQuestions 

Here is a working example (borrowing some slightly modified setup from Mack). For your purposes you could store the results of that query in a common table expression, or in a subquery (I'll leave that as an exercise).

The second method is to use a recursive common table expression. Here is an annotated example of how that would work:

--NumberedTags establishes a ranked list of tags for each question.
--The key here is using row_number() or rank() partitioned by the particular question
;with NumberedTags (QuestionID, TagString, TagNum) as
(
    select  QuestionID,
            cast(TagName as nvarchar(max)) as TagString,
            row_number() over (partition by QuestionID order by rsTags.TagID) as TagNum
    from @QuestionTags rsMap
    inner join @Tags rsTags on rsTags.TagID = rsMap.TagID
),
--TagsAsString is the recursive query
TagsAsString (QuestionID, TagString, TagNum) as
(
    --The first query in the common table expression establishes the anchor for the 
    --recursive query, in this case selecting the first tag for each question
    select  QuestionID,
            TagString,
            TagNum
    from NumberedTags 
    where TagNum = 1
        
    union all
    
    --The second query in the union performs the recursion by joining the 
    --anchor to the next tag, and so on...
    select  NumberedTags.QuestionID,
            TagsAsString.TagString + ', ' + NumberedTags.TagString,
            NumberedTags.TagNum
    from    NumberedTags
    inner join TagsAsString on TagsAsString.QuestionID = NumberedTags.QuestionID
                           and NumberedTags.TagNum = TagsAsString.TagNum + 1
)
--The result of the recursive query is a list of tag strings building up to the final
--string, of which we only want the last, so here we select the longest one which
--gives us the final result
select QuestionID, max(TagString) 
from TagsAsString                         
group by QuestionID

And here is a working version. Again, you could use the results in a common table expression or subquery to join against your other tables to get your ultimate result. Hopefully the annotations help you understand a little more how the recursive common table expression works (though the link in Macks answer also goes into some detail about the method).

There is, of course, another way to do it, which doesn't handle an arbitrary number of rows, which is to join against your table aliased multiple times, which is what you did in your answer.

Upvotes: 1

Billy ONeal
Billy ONeal

Reputation: 106530

This was the solution I ended up settling on. I checkmarked Mack's answer because it works with arbitrary numbers of tags, and because it matches what I asked for in my question. I ended up though going with this, however, simply because I understand what this is doing, while I have no idea how Mack's works :)

WITH tagScans (qRevId, tagName, tagRank)
AS (
    SELECT DISTINCT
        QuestionTags.question AS qRevId,
        Tags.name AS tagName,
        ROW_NUMBER() OVER (PARTITION BY QuestionTags.question ORDER BY Tags.name) AS tagRank
    FROM QuestionTags
    INNER JOIN Tags ON Tags.id = QuestionTags.tag
)
SELECT
    Questions.id AS id,
    Questions.currentScore AS currentScore,
    answerCounts.number AS answerCount,
    latestRevUser.id AS latestRevUserId,
    latestRevUser.caseId AS lastRevUserCaseId,
    latestRevUser.currentScore AS lastRevUserScore,
    CreatingUsers.userId AS creationUserId,
    CreatingUsers.caseId AS creationUserCaseId,
    CreatingUsers.userScore AS creationUserScore,
    t1.tagName AS tagOne,
    t2.tagName AS tagTwo,
    t3.tagName AS tagThree,
    t4.tagName AS tagFour,
    t5.tagName AS tagFive
FROM Questions
INNER JOIN QuestionRevisions ON QuestionRevisions.question = Questions.id
INNER JOIN
(
    SELECT
        Questions.id AS questionId,
        MAX(QuestionRevisions.id) AS maxRevisionId
    FROM Questions
    INNER JOIN QuestionRevisions ON QuestionRevisions.question = Questions.id
    GROUP BY Questions.id
) AS LatestQuestionRevisions ON QuestionRevisions.id = LatestQuestionRevisions.maxRevisionId
INNER JOIN Users AS latestRevUser ON latestRevUser.id = QuestionRevisions.creatingUser
INNER JOIN
(
    SELECT
        QuestionRevisions.question AS questionId,
        Users.id AS userId,
        Users.caseId AS caseId,
        Users.currentScore AS userScore
    FROM Users
    INNER JOIN QuestionRevisions ON QuestionRevisions.creatingUser = Users.id
    INNER JOIN
    (
        SELECT
            MIN(QuestionRevisions.id) AS minQuestionRevisionId
        FROM Questions
        INNER JOIN QuestionRevisions ON QuestionRevisions.question = Questions.id
        GROUP BY Questions.id
    ) AS QuestionGroups ON QuestionGroups.minQuestionRevisionId = QuestionRevisions.id
) AS CreatingUsers ON CreatingUsers.questionId = Questions.id
INNER JOIN
(
    SELECT
        COUNT(*) AS number,
        Questions.id AS questionId
    FROM Questions
    INNER JOIN Answers ON Answers.question = Questions.id
    GROUP BY Questions.id
) AS answerCounts ON answerCounts.questionId = Questions.id
LEFT JOIN tagScans AS t1 ON t1.qRevId = QuestionRevisions.id AND t1.tagRank = 1
LEFT JOIN tagScans AS t2 ON t2.qRevId = QuestionRevisions.id AND t2.tagRank = 2
LEFT JOIN tagScans AS t3 ON t3.qRevId = QuestionRevisions.id AND t3.tagRank = 3
LEFT JOIN tagScans AS t4 ON t4.qRevId = QuestionRevisions.id AND t4.tagRank = 4
LEFT JOIN tagScans AS t5 ON t5.qRevId = QuestionRevisions.id AND t5.tagRank = 5
ORDER BY QuestionRevisions.postDate DESC

Upvotes: 1

Related Questions