Peter
Peter

Reputation: 1065

Group query records by proximity of value

I have an audit log table, logging changes from multiple tables against a business object. The logs are timestamped, and updates for a single business object will most likely occur over a period. In other words, it may take up to 5 seconds to save a contract, and the records added or updated in that time will span across that time. From the table below, the last column shows the timestamp with with slight variance between the values.

cm_contract 1087    2013-05-20 14:30:24.713
cm_contract 1087    2013-05-20 14:30:24.717
cm_contract 1087    2013-05-20 14:30:24.750
cm_contract 1087    2013-05-20 14:30:24.763
cm_contract 1087    2013-05-20 14:30:24.817
cm_contract 1087    2013-05-20 14:30:24.833
cm_contract 1087    2013-05-20 14:30:24.837
cm_contract 1087    2013-05-20 14:30:24.843
cm_contract 1087    2013-05-20 14:30:24.850
cm_contract 1087    2013-05-20 14:30:24.853

In the viewer, I would like to summarise the data, showing the business object that has been changed along with the number of logs against that business object. In order to do this, I need to group the records by business object and key, and by those with a similar timestamp. I have achieved this using a temp table and a variable, but I ultimately want to put this into a view, so I'm wondering if there is an easier way to do this:

SELECT ROW_NUMBER() OVER (ORDER BY business_object_table, business_object_key, mod_date) AS row_num, 
        audit_trail_key, business_object_table, business_object_key, mod_date, 0 AS part
INTO #temp
FROM audit_trail WHERE business_object_table is not null

DECLARE @part INT=0

UPDATE t2 SET @part = CASE WHEN ABS(DATEDIFF(millisecond, t2.mod_date, t1.mod_date)) < 1000 THEN @part ELSE @part + 1 END, part = @part
FROM #temp t2 INNER JOIN #temp t1 ON t2.row_num = t1.row_num +1
WHERE t2.business_object_table = t1.business_object_table 
AND t2.business_object_key = t1.business_object_key

SELECT * FROM #temp 

DROP TABLE #temp

I've been looking for something in T-SQL to the effect of

CLUSTER BY ABS(DATEDIFF(millisecond, t2.mod_date, t1.mod_date)) < 1000

but keep getting redirect to SQL Server failover clusters, which is not what I'm after. Anyone have any ideas

Upvotes: 0

Views: 926

Answers (4)

Peter
Peter

Reputation: 1065

The solution I came up with is similar to the accepted answer above, but better supports varying batch sizes by using the time between two consecutive updates rather than all updates within a period. I expect a similar solution could be used for clustering geospacial data as well - I thought someone might find this interesting.

To summarise, I have used a common table expression to generate a row number for each audit record, ordered by business object type, key and date recorded.

I then use another cte to extract each record from cte1 whose previous update was more than one second prior, along with generating a consecutive row number for this resultset.

I then self-join cte2 to retrieve the summary for all audit records between each consecutive record. Sample code follows:

-- Audit record row numbers by business object table, key and mod_date
WITH cte1 AS (SELECT ROW_NUMBER() OVER (ORDER BY business_object_table, business_object_key, mod_date) AS row_num, audit_trail_key,
        business_object_table, business_object_key, mod_date, user_key, business_object_name FROM audit_trail),
-- Get audit records where previous update was more than a second prior, and include the first audit record
cte2 AS (SELECT ROW_NUMBER() OVER (ORDER BY a2.row_num) AS row_num, a2.audit_trail_key,
        a2.business_object_table, a2.business_object_key, a2.mod_date, a2.user_key, a2.business_object_name
    FROM cte1 a2 LEFT JOIN cte1 a1 ON a2.row_num = a1.row_num + 1 AND a2.business_object_table = a1.business_object_table
        AND a2.business_object_key = a1.business_object_key
    WHERE ABS(DATEDIFF(ss, a1.mod_date, a2.mod_date)) > 1 OR a1.audit_trail_key IS NULL
    AND a2.business_object_table is not null)
-- Summarise details within each cluster    
SELECT a1.audit_trail_key, a1.business_object_table, a1.business_object_key, a1.mod_date AS first_mod, 
    u.username, a1.user_key, a1.business_object_name, 
    (SELECT audit_rec_table + ': ' + CAST(COUNT(*) AS VARCHAR) + ' record' +
        CASE WHEN COUNT(*) > 1 THEN 's' ELSE '' END + CASE mod_type WHEN 'U' THEN ' Changed' WHEN 'D' 
                THEN ' Deleted' ELSE ' Added' END + CHAR(10) FROM audit_trail
        WHERE business_object_table = a1.business_object_table AND business_object_key = a1.business_object_key 
                AND mod_date >= a1.mod_date AND mod_date < ISNULL(a2.mod_date, mod_date + 1)
        GROUP BY audit_rec_table, mod_type
        FOR XML PATH('')) AS change_summary
FROM cte2 a1 LEFT JOIN cte2 a2 ON a2.row_num = a1.row_num + 1 AND a2.business_object_table = a1.business_object_table
        AND a2.business_object_key = a1.business_object_key
    INNER JOIN su_user u ON u.user_key = a1.user_key

Upvotes: 0

Stoleg
Stoleg

Reputation: 9320

I would suggest you converting your Timestamp to Smalldatetime, which keeps time with prescioson of a minite. It substitutes your Case statement. Count will do the rest.

SELECT business_object_table, business_object_key, cast (mod_date as smalldatetime) as mod_date, count (*) as No_of_Changes
FROM audit_trail 
WHERE business_object_table is not null
GROUP BY business_object_table, business_object_key, cast (mod_date as smalldatetime)
ORDER BY 3,1,2

Hope this helps.

Upvotes: 1

LoztInSpace
LoztInSpace

Reputation: 5697

SELECT CAST(CONVERT(datetime,mod_date) as float)

will give you a double representation of a date.

You can then divide that and discard some decimal places to get some kind of "similar timestamp".

Seems though, you are missing a fundamental part of your business logic - some identifier that is common against all rows taking part in a transaction. Without this, you are guessing and any kind of multi-user activity will cause a problem. I'd discard your time-based approach and propagate a transaction identifier (*not SQL transaction - a business transaction) of some sort thorough the log.

Upvotes: 1

JimmyB
JimmyB

Reputation: 12610

Maybe try something like this as a start ("pseudo-SQL", not tested):

select t1.myRowId, t1.contractId, min(t2.timestamp) - t1.timestamp as DeltaT
from myTable t1
inner join myTable t2 on t1.contractId = t2.contractId and t2.timestamp > t1.timestamp
group by t1.myRowId, t1.contractId
having min(t2.timestamp) - t1.timestamp > "60 seconds"

Upvotes: 2

Related Questions