Nick Thomson
Nick Thomson

Reputation: 211

Updating a set of data from the next matching record

I have a table that contains the status history (including current status) of a number of resources. It looks something like this:

CREATE TABLE [dbo].[RESOURCE_STATUS](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [resource_id] [int] NOT NULL,
    [date_timestamp] [datetime] NOT NULL,
    [end_timestamp] [datetime] NULL,
    ...
)

And an example of the data ( obviously represents a valid timestamp):

+------+-------------+-----------------+---------------+
| id   | resource_id | start_timestamp | end_timestamp |
+------+-------------+-----------------+---------------+
| 1    |  1          | <valid_ts>      | <valid_ts>    |
| 2    |  2          | <valid_ts>      | <valid_ts>    |
| 3    |  3          | <valid_ts>      | NULL          |
| 4    |  1          | <valid_ts>      | NULL          |
| 5    |  2          | <valid_ts>      | NULL          |
| 6    |  1          | <valid_ts>      | <valid_ts>    |
| 7    |  2          | <valid_ts>      | NULL          |
| 8    |  1          | <valid_ts>      | NULL          |
+------+-------------+-----------------+---------------+

There are, of course, additional columns representing the status etc but I don't think they are relevant at this point.

In theory, the start_timestamp and end_timestamp in each record are supposed to indicate the date and time of the start and end of each status with a NULL end_timestamp indicating that the status is ongoing (in this case rows 3, 7 and 8 indicate ongoing statuses).

The problem I have is that in some cases (rows 4 and 5 in the example) the end_timestamp hasn't been set and to get our reporting system working properly I need to go back and set that timestamp from the start_timestamp in the next record for that resource in the set if it exists. I.E. update row 4 from row 6 and row 5 from row 7. Rows 3, 7 and 8 shouldn't be modified since they represent the current state of the resource.

Note that the row missing the end_timestamp may not be the first row for that resource_id and there may be multiple rows for one or more resources that have incorrectly NULL end_timestamps.

I need to do this both for the existing data and on an ongoing basis when data is added to the table (I know the creator of the data should be fixed, but for various reasons that isn't on the table at this point).

In case it's relevant, we're using MS SQL Server 2008 and the table currently contains just over two million rows and obviously is growing on a daily basis.

Can anyone help me out with this please?

Upvotes: 0

Views: 39

Answers (1)

The Dumb Radish
The Dumb Radish

Reputation: 896

Try this...

WITH AddRowNumber AS
(
SELECT
    id
,   resource_id
,   start_timestamp
,   end_timestamp
,   ROW_NUMBER() OVER(PARTITION BY resource_id ORDER BY start_timestamp) AS RowNumber
FROM
    @Resource_Status 
)
,   NewTimestamp AS 
(
SELECT
    A1.id
,   A1.resource_id
,   A1.start_timestamp
,   A1.end_timestamp
,   A2.start_timestamp AS NewEndTimeStamp
FROM
    AddRowNumber AS A1
INNER JOIN
    AddRowNumber AS A2
ON
    A1.resource_id = A2.resource_id
AND A1.RowNumber = A2.RowNumber - 1
WHERE
    A1.end_timestamp IS NULL
)
UPDATE
@Resource_Status
SET
end_timestamp = Nt.NewEndTimeStamp
FROM
@Resource_Status AS R
INNER JOIN
NewTimestamp AS NT
ON
R.id = NT.id

Let me know if that works.

Ash

Upvotes: 1

Related Questions