Reputation: 1065
I have an audit log table, logging changes from multiple tables against a business object. The logs are timestamped, and updates for a single business object will most likely occur over a period. In other words, it may take up to 5 seconds to save a contract, and the records added or updated in that time will span across that time. From the table below, the last column shows the timestamp with with slight variance between the values.
cm_contract 1087 2013-05-20 14:30:24.713
cm_contract 1087 2013-05-20 14:30:24.717
cm_contract 1087 2013-05-20 14:30:24.750
cm_contract 1087 2013-05-20 14:30:24.763
cm_contract 1087 2013-05-20 14:30:24.817
cm_contract 1087 2013-05-20 14:30:24.833
cm_contract 1087 2013-05-20 14:30:24.837
cm_contract 1087 2013-05-20 14:30:24.843
cm_contract 1087 2013-05-20 14:30:24.850
cm_contract 1087 2013-05-20 14:30:24.853
In the viewer, I would like to summarise the data, showing the business object that has been changed along with the number of logs against that business object. In order to do this, I need to group the records by business object and key, and by those with a similar timestamp. I have achieved this using a temp table and a variable, but I ultimately want to put this into a view, so I'm wondering if there is an easier way to do this:
SELECT ROW_NUMBER() OVER (ORDER BY business_object_table, business_object_key, mod_date) AS row_num,
audit_trail_key, business_object_table, business_object_key, mod_date, 0 AS part
INTO #temp
FROM audit_trail WHERE business_object_table is not null
DECLARE @part INT=0
UPDATE t2 SET @part = CASE WHEN ABS(DATEDIFF(millisecond, t2.mod_date, t1.mod_date)) < 1000 THEN @part ELSE @part + 1 END, part = @part
FROM #temp t2 INNER JOIN #temp t1 ON t2.row_num = t1.row_num +1
WHERE t2.business_object_table = t1.business_object_table
AND t2.business_object_key = t1.business_object_key
SELECT * FROM #temp
DROP TABLE #temp
I've been looking for something in T-SQL to the effect of
CLUSTER BY ABS(DATEDIFF(millisecond, t2.mod_date, t1.mod_date)) < 1000
but keep getting redirect to SQL Server failover clusters, which is not what I'm after. Anyone have any ideas
Upvotes: 0
Views: 926
Reputation: 1065
The solution I came up with is similar to the accepted answer above, but better supports varying batch sizes by using the time between two consecutive updates rather than all updates within a period. I expect a similar solution could be used for clustering geospacial data as well - I thought someone might find this interesting.
To summarise, I have used a common table expression to generate a row number for each audit record, ordered by business object type, key and date recorded.
I then use another cte to extract each record from cte1 whose previous update was more than one second prior, along with generating a consecutive row number for this resultset.
I then self-join cte2 to retrieve the summary for all audit records between each consecutive record. Sample code follows:
-- Audit record row numbers by business object table, key and mod_date
WITH cte1 AS (SELECT ROW_NUMBER() OVER (ORDER BY business_object_table, business_object_key, mod_date) AS row_num, audit_trail_key,
business_object_table, business_object_key, mod_date, user_key, business_object_name FROM audit_trail),
-- Get audit records where previous update was more than a second prior, and include the first audit record
cte2 AS (SELECT ROW_NUMBER() OVER (ORDER BY a2.row_num) AS row_num, a2.audit_trail_key,
a2.business_object_table, a2.business_object_key, a2.mod_date, a2.user_key, a2.business_object_name
FROM cte1 a2 LEFT JOIN cte1 a1 ON a2.row_num = a1.row_num + 1 AND a2.business_object_table = a1.business_object_table
AND a2.business_object_key = a1.business_object_key
WHERE ABS(DATEDIFF(ss, a1.mod_date, a2.mod_date)) > 1 OR a1.audit_trail_key IS NULL
AND a2.business_object_table is not null)
-- Summarise details within each cluster
SELECT a1.audit_trail_key, a1.business_object_table, a1.business_object_key, a1.mod_date AS first_mod,
u.username, a1.user_key, a1.business_object_name,
(SELECT audit_rec_table + ': ' + CAST(COUNT(*) AS VARCHAR) + ' record' +
CASE WHEN COUNT(*) > 1 THEN 's' ELSE '' END + CASE mod_type WHEN 'U' THEN ' Changed' WHEN 'D'
THEN ' Deleted' ELSE ' Added' END + CHAR(10) FROM audit_trail
WHERE business_object_table = a1.business_object_table AND business_object_key = a1.business_object_key
AND mod_date >= a1.mod_date AND mod_date < ISNULL(a2.mod_date, mod_date + 1)
GROUP BY audit_rec_table, mod_type
FOR XML PATH('')) AS change_summary
FROM cte2 a1 LEFT JOIN cte2 a2 ON a2.row_num = a1.row_num + 1 AND a2.business_object_table = a1.business_object_table
AND a2.business_object_key = a1.business_object_key
INNER JOIN su_user u ON u.user_key = a1.user_key
Upvotes: 0
Reputation: 9320
I would suggest you converting your Timestamp to Smalldatetime, which keeps time with prescioson of a minite. It substitutes your Case statement. Count will do the rest.
SELECT business_object_table, business_object_key, cast (mod_date as smalldatetime) as mod_date, count (*) as No_of_Changes
FROM audit_trail
WHERE business_object_table is not null
GROUP BY business_object_table, business_object_key, cast (mod_date as smalldatetime)
ORDER BY 3,1,2
Hope this helps.
Upvotes: 1
Reputation: 5697
SELECT CAST(CONVERT(datetime,mod_date) as float)
will give you a double representation of a date.
You can then divide that and discard some decimal places to get some kind of "similar timestamp".
Seems though, you are missing a fundamental part of your business logic - some identifier that is common against all rows taking part in a transaction. Without this, you are guessing and any kind of multi-user activity will cause a problem. I'd discard your time-based approach and propagate a transaction identifier (*not SQL transaction - a business transaction) of some sort thorough the log.
Upvotes: 1
Reputation: 12610
Maybe try something like this as a start ("pseudo-SQL", not tested):
select t1.myRowId, t1.contractId, min(t2.timestamp) - t1.timestamp as DeltaT
from myTable t1
inner join myTable t2 on t1.contractId = t2.contractId and t2.timestamp > t1.timestamp
group by t1.myRowId, t1.contractId
having min(t2.timestamp) - t1.timestamp > "60 seconds"
Upvotes: 2