Bebass
Bebass

Reputation: 141

Algorithm to do a summation over a column with random selection of data from other column

I have a table like this:

CREATE TABLE Table1
    ([IdeaNr] int, [SubmitterName] varchar(4), [SubmitterDepartment] varchar(4))
;

INSERT INTO Table1
    ([IdeaNr], [SubmitterName], [SubmitterDepartment])
VALUES
    (1, 'Joe', 'Org1'),
    (1, 'Bill', 'Org2'),
    (1, 'Kate', 'Org1'),
    (1, 'Tom', 'Org3'),
    (2, 'Sue', 'Org2'),
    (3, 'Bill', 'Org2'),
    (3, 'Fred', 'Org1'),
    (4, 'Ted', 'Org3'),
    (4, 'Kate', 'Org1'),
    (4, 'Hank', 'Org3')
;

I want get the following result from a query:

IdeaNr  SubmitterCount   SubmitterRndName   SubmitterRndDepartment
1       4                Joe or ...         Org1 (if Joe)
2       1                Sue                Org2
3       2                Bill or ...        Org2 (if Bill)
4       3                Ted or ...         Org3 (if Ted)

I have tried a lot of things with all kind of JOINs of Table1 with itself, derived tables and GROUP BY, e.g.:

SELECT COUNT(IdeaNr) AS SubmitterCount,IdeaNr,SubmitterName,SubmitterDepartment
FROM Table1
GROUP BY IdeaNr,SubmitterName,SubmitterDepartment

I think the problem is to create an algorithm that takes just the first (or a random) name and department appearing in a group of IdeaNr. It is absolutely clear that you can get to misleading interpretations of that kind of data, e. g.:

But this kind of "wrong averaging" is OK for the task. Can you help?

EDIT: The expected result evolved during the discussion. The desired result changed to:

IdeaNr  SubmitterCount   SubmitterRndName   SubmitterRndDepartment
1       4                Joe, Bill, ...     GroupIdea
2       1                Sue                Org2
3       2                Bill, Fred         GroupIdea
4       3                Ted, ...           GroupIdea

Upvotes: 4

Views: 80

Answers (2)

Gottfried Lesigang
Gottfried Lesigang

Reputation: 67311

Try it like this:

DECLARE @Table1 TABLE ([IdeaNr] int, [SubmitterName] varchar(4), [SubmitterDepartment] varchar(4));
INSERT INTO @Table1([IdeaNr], [SubmitterName], [SubmitterDepartment])
VALUES
    (1, 'Joe', 'Org1'),
    (1, 'Bill', 'Org2'),
    (1, 'Kate', 'Org1'),
    (1, 'Tom', 'Org3'),
    (2, 'Sue', 'Org2'),
    (3, 'Bill', 'Org2'),
    (3, 'Fred', 'Org1'),
    (4, 'Ted', 'Org3'),
    (4, 'Kate', 'Org1'),
    (4, 'Hank', 'Org3');

SELECT x.IdeaNr
      ,Count(x.IdeaNr)
      ,MAX(Submitter.SubmitterName) AS SubmitterRndName
      ,MAX(Submitter.SubmitterDepartment) AS SubmitterRndDepartment
FROM @Table1 AS x 
CROSS APPLY
(
    SELECT TOP 1 SubmitterName, SubmitterDepartment
    FROM @Table1 AS y 
    WHERE y.IdeaNr=x.IdeaNr
) AS Submitter
GROUP BY x.IdeaNr

There is one more idea, don't know if you could need this:

SELECT x.IdeaNr
      ,Count(x.IdeaNr)
      ,STUFF(
            (
            SELECT ', ' + y.SubmitterName --maybe with DISTINCT
            FROM @Table1 AS y
            WHERE y.IdeaNr=x.IdeaNr
            FOR XML PATH('')
            ),1,2,'') AS AllSubmitters
      ,STUFF(
            (
            SELECT ', ' + z.SubmitterDepartment --maybe with DISTINCT
            FROM @Table1 AS z
            WHERE z.IdeaNr=x.IdeaNr
            FOR XML PATH('')
            ),1,2,'') AS AllDepartments
FROM @Table1 AS x 
GROUP BY x.IdeaNr

This comes back with

IdeaNr                  AllSubmitters              AllDepartments
1           4           Joe, Bill, Kate, Tom       Org1, Org2, Org1, Org3
2           1           Sue                        Org2
3           2           Bill, Fred                 Org2, Org1
4           3           Ted, Kate, Hank            Org3, Org1, Org3

EDIT: Following your idea from the last comment:

SELECT x.IdeaNr
      ,COUNT(x.IdeaNr)
      ,STUFF(
            (
            SELECT DISTINCT ', ' + y.SubmitterName 
            FROM @Table1 AS y
            WHERE y.IdeaNr=x.IdeaNr
            FOR XML PATH('')
            ),1,2,'') AS AllSubmitters
      ,CASE WHEN COUNT(x.IdeaNr)=1 THEN (SELECT TOP 1 z.SubmitterDepartment FROM @Table1 AS z WHERE z.IdeaNr=x.IdeaNr)
            ELSE 'GroupIdea' END AS Departments
FROM @Table1 AS x 
GROUP BY x.IdeaNr

Upvotes: 3

Vladimir Baranov
Vladimir Baranov

Reputation: 32693

If you want to read more about this topic search for top-N-per-group. In SQL Server it is easy to do using CROSS APPLY.

SQL Fiddle

WITH
CTE
AS
(
    SELECT
        IdeaNr
        ,COUNT(*) AS SubmitterCount
    FROM @Table1
    GROUP BY IdeaNr
)
SELECT
    CTE.IdeaNr
    ,CTE.SubmitterCount
    ,CA.SubmitterName
    ,CA.SubmitterDepartment
FROM
    CTE
    CROSS APPLY
    (
        SELECT TOP(1)
            T.SubmitterName
            ,T.SubmitterDepartment
        FROM @Table1 AS T
        WHERE T.IdeaNr = CTE.IdeaNr

        --ORDER BY T.SubmitterName
        --ORDER BY T.SubmitterDepartment
        --ORDER BY CRYPT_GEN_RANDOM(4)

    ) AS CA
ORDER BY CTE.IdeaNr;

If you don't put any ORDER BY in the CROSS APPLY part the server will pick one "random" row. It is not random as such, but results may be the same or may differ when you run this query several times. In practice, results will most likely differ if you create or drop indexes on the table, but if the table is large they may differ every time the query runs.

If you want to pick some specific row for each IdeaNr, then use ORDER BY Name or Department or some ID, etc.

If you want to pick a really random row, then ORDER BY CRYPT_GEN_RANDOM(4).

I get the following result without any ORDER BY when I use table variable for this test without any indexes:

IdeaNr    SubmitterCount    SubmitterName    SubmitterDepartment
1         4                 Joe              Org1
2         1                 Sue              Org2
3         2                 Bill             Org2
4         3                 Ted              Org3

It looks as if it picked the "first" row for each IdeaNr in the order as they were added to the table. But, don't be fooled, the order is not guaranteed without explicit ORDER BY. If you want to get the first row for each IdeaNr in the order as they were added to the table, you need to store information about this order somehow. For example, add a column ID int IDENTITY to the table that would increment automatically as new rows are added and then you can use it like this ORDER BY ID DESC to get guaranteed results.

Play with SQL Fiddle to see how it works.

Upvotes: 3

Related Questions