captncraig
captncraig

Reputation: 23108

SQL to resequence items by groups

Lets say I have a database that looks like this:

tblA:
ID, Name, Sequence, tblBID
1     a       5        14
2     b       3        15
3     c       3        16
4     d       3        17

tblB:
ID, Group
14    1
15    1
16    2
17    3

I would like to sequence A so that the sequences go 1...n for each group of B. So in this case, the sequences going down should be 1,2,1,1.

The ordering needs to be consistent with the current ordering, but there are no guarantees as to the current ordering.

I am not exactly a sql master and I am sure there is a fairly easy way to do this, but I really don't know the right route to take. Any hints?

Upvotes: 0

Views: 615

Answers (3)

ErikE
ErikE

Reputation: 50251

Because it's SQL 2000, we can't use a windowing function. That's okay.

Thomas's queries are good and will work. However, they will get worse and worse as the number of rows increases—with different characteristics depending on how wide (the number of groups) and how deep (the number of items per group). This is because those queries use a partial cross-join, perhaps we could call it a "pyramidal cross-join" where the crossing part is limited to right side values less than left side values rather than left crossing to all right values.

What to do?

I think you will be surprised to find that the following long and painful-looking script will outperform the pyramidal join at a certain size of data (which may not be all that big) and eventually, with really large data sets must be considered a screaming performer:

CREATE TABLE #tblA (
   ID int identity(1,1) NOT NULL,
   Name varchar(1) NOT NULL,
   Sequence int NOT NULL,
   tblBID int NOT NULL,
   PRIMARY KEY CLUSTERED (ID)
)

INSERT #tblA VALUES ('a', 5, 14)
INSERT #tblA VALUES ('b', 3, 15)
INSERT #tblA VALUES ('c', 3, 16)
INSERT #tblA VALUES ('d', 3, 17)

CREATE TABLE #tblB (
   ID int NOT NULL PRIMARY KEY CLUSTERED,
   GroupID int NOT NULL
)
INSERT #tblB VALUES (14, 1)
INSERT #tblB VALUES (15, 1)
INSERT #tblB VALUES (16, 2)
INSERT #tblB VALUES (17, 3)

CREATE TABLE #seq (
   seq int identity(1,1) NOT NULL,
   ID int NOT NULL,
   GroupID int NOT NULL,
   PRIMARY KEY CLUSTERED (ID)
)

INSERT #seq
SELECT
   A.ID,
   B.GroupID
FROM
   #tblA A
   INNER JOIN #tblB B ON A.tblBID = b.ID
ORDER BY B.GroupID, A.Sequence

UPDATE A
SET A.Sequence = S.seq - X.MinSeq + 1
FROM
   #tblA A
   INNER JOIN #seq S ON A.ID = S.ID
   INNER JOIN (
      SELECT GroupID, MinSeq = Min(seq)
      FROM #seq
      GROUP BY GroupID
   ) X ON S.GroupID = X.GroupID

SELECT * FROM #tblA

DROP TABLE #seq
DROP TABLE #tblB
DROP TABLE #tblA

If I understood you correctly, then ORDER BY B.GroupID, A.Sequence is correct. If not, you can switch A.Sequence to B.ID.

Also, my index on the temp table should be experimented with. For a certain quantity of rows, and also the width and depth characteristics of those rows, clustering on one of the other two columns in the #seq table could be helpful.

Last, there is a possible different data organization possible: leaving GroupID out of the #seq table and joining again. I suspect it would be worse, but am not 100% sure.

Upvotes: 2

Thomas
Thomas

Reputation: 64655

If you are using SQL Server 2005+ or higher, you can use a ranking function:

Select tblA.Id, tblA.Name
    , Row_Number() Over ( Partition By tblB.[Group] Order By tblA.Id ) As Sequence
    , tblA.tblBID
From tblA
    Join tblB
        On tblB.tblBID = tblB.ID

Row_Number ranking function.

Here's another solution that would work in SQL Server 2000 and prior.

Select A.Id, A.Name
    , (Select Count(*)
        From tblB As B1
        Where B1.[Group] = B.[Group]
            And B1.Id < B.ID) + 1 As Sequence
    , A.tblBID
From tblA As A
    Join tblB As B
        On B.Id = A.tblBID

EDIT

Also want to make it clear that I want to actually update tblA to reflect the proper sequences.

In SQL Server, you can use their proprietary From clause in an Update statement like so:

Update tblA
Set Sequence =  (
                Select Count(*)
                From tblB As B1
                Where B1.[Group] = B.[Group]
                    And B1.Id < B.ID
                ) + 1
From tblA As A
    Join tblB As B
        On B.Id = A.tblBID

The Hoyle ANSI solution might be something like:

Update tblA
Set Sequence = (
                Select (Select Count(*)
                        From tblB As B1
                        Where B1.[Group] = B.[Group]
                            And B1.Id < B.ID) + 1
                From tblA As A
                    Join tblB As B
                        On B.Id = A.tblBID
                Where A.Id = tblA.Id
                )

EDIT

Can we do that [the inner group] comparison based on A.Sequence instead of B.ID?

Select A1.*
    , (Select Count(*)
        From tblB As B2
            Join tblA As A2
                On A2.tblBID = B2.Id
        Where B2.[Group] = B1.[Group]
            And A2.Sequence < A1.Sequence) + 1
From tblA As A1
    Join tblB As B1
        On B1.Id = A1.tblBID

Upvotes: 5

user330315
user330315

Reputation:

Something like:

SELECT a.id, a.name, row_number() over (partition by b.group order by a.id)
FROM tblA a
  JOIN tblB on a.tblBID = b.ID;

Upvotes: 0

Related Questions