Reputation: 23
I’m working on a power bi audit log report file. The file contains a column 'AuditDate' and it has multiple columns in it. I need to split that column into multiple columns using sql.
The column has values like this
AuditDate
------------
"{""Id"":""44de2468"",""RecordType"":20,""CreationTime"":""2018-08-03T12:30:34"",""Operation"":""ViewReport"",""OrganizationId"":""779558"",""UserType"":0,""UserKey"":""FFFA3DA"",""Workload"":""PowerBI"",""UserId"":""[email protected]"",""ClientIP"":""9.5.3.26"",""UserAgent"":""Mozilla\/5.0 (Windows NT 10.0;"",""Activity"":""ViewReport"",""ItemName"":""Sales"",""WorkSpaceName"":""TeamITO"",""DatasetName"":""Sales1"",""ReportName"":""Sales1"",""WorkspaceId"":""e8eaa0ca"",""ObjectId"":""Sales1"",""DatasetId"":""4c5d-ad45-eb6546"",""ReportId"":""4cb0-99ad-de41b5160c47"",""IsSuccess"":true,""DatapoolRefreshScheduleType"":""None"",""DatapoolType"":""Undefined""}"
Basically I need to split this column into
id RecordType CreationTime Operaration OrganizationID UserType
------------------------------------------------------------------------------
44de2468 20 2018-08-03T12:30:34 ViewReport 779558 0
Can anyone help with the sql query for this?
Upvotes: 1
Views: 341
Reputation: 67291
With SQL-Server 2016 this is pretty simple. There is quite some JSON support. Your only problem is, that your string is not correct. Quite obviously there was an engine doubling all internal quotes (an escaping technique).
If this is under your control, you should try to change your column's format to proper JSON. Even better let the writing application supply these audits in a correct JSON format. At least you might add a second column and use a trigger to keep this synchronised. As a last resort you can use REPLACE
to repair your string:
REPLACE(REPLACE(REPLACE(@YourString,'"{','{'),'}"','}'),'""','"');
With a lot of rows this might take a while... That's why it's better to keep the format in proper JSON.
Just to show the principles:
DECLARE @YourString NVARCHAR(MAX)=N'"{""Id"":""44de2468"",""RecordType"":20,""CreationTime"":""2018-08-03T12:30:34"",""Operation"":""ViewReport"",""OrganizationId"":""779558"",""UserType"":0,""UserKey"":""FFFA3DA"",""Workload"":""PowerBI"",""UserId"":""[email protected]"",""ClientIP"":""9.5.3.26"",""UserAgent"":""Mozilla\/5.0 (Windows NT 10.0;"",""Activity"":""ViewReport"",""ItemName"":""Sales"",""WorkSpaceName"":""TeamITO"",""DatasetName"":""Sales1"",""ReportName"":""Sales1"",""WorkspaceId"":""e8eaa0ca"",""ObjectId"":""Sales1"",""DatasetId"":""4c5d-ad45-eb6546"",""ReportId"":""4cb0-99ad-de41b5160c47"",""IsSuccess"":true,""DatapoolRefreshScheduleType"":""None"",""DatapoolType"":""Undefined""}"';
SET @YourString = REPLACE(REPLACE(REPLACE(@YourString,'"{','{'),'}"','}'),'""','"');
Your string will look like this now:
{"Id":"44de2468","RecordType":20,"CreationTime":"2018-08-03T12:30:34","Operation":"ViewReport","OrganizationId":"779558","UserType":0,"UserKey":"FFFA3DA","Workload":"PowerBI","UserId":"[email protected]","ClientIP":"9.5.3.26","UserAgent":"Mozilla\/5.0 (Windows NT 10.0;","Activity":"ViewReport","ItemName":"Sales","WorkSpaceName":"TeamITO","DatasetName":"Sales1","ReportName":"Sales1","WorkspaceId":"e8eaa0ca","ObjectId":"Sales1","DatasetId":"4c5d-ad45-eb6546","ReportId":"4cb0-99ad-de41b5160c47","IsSuccess":true,"DatapoolRefreshScheduleType":"None","DatapoolType":"Undefined"}
This query will return all your columns as a drived list:
SELECT *
FROM OPENJSON(@YourString);
The result brings back a list with a type hint (while the actual type of "value" is nvarchar
):
+-----------------------------+-------------------------------+------+
| key | value | type |
+-----------------------------+-------------------------------+------+
| Id | 44de2468 | 1 |
+-----------------------------+-------------------------------+------+
| RecordType | 20 | 2 |
+-----------------------------+-------------------------------+------+
| CreationTime | 2018-08-03T12:30:34 | 1 |
+-----------------------------+-------------------------------+------+
| Operation | ViewReport | 1 |
+-----------------------------+-------------------------------+------+
| OrganizationId | 779558 | 1 |
+-----------------------------+-------------------------------+------+
| UserType | 0 | 2 |
+-----------------------------+-------------------------------+------+
| UserKey | FFFA3DA | 1 |
+-----------------------------+-------------------------------+------+
| Workload | PowerBI | 1 |
+-----------------------------+-------------------------------+------+
| UserId | [email protected] | 1 |
+-----------------------------+-------------------------------+------+
| ClientIP | 9.5.3.26 | 1 |
+-----------------------------+-------------------------------+------+
| UserAgent | Mozilla/5.0 (Windows NT 10.0; | 1 |
+-----------------------------+-------------------------------+------+
| Activity | ViewReport | 1 |
+-----------------------------+-------------------------------+------+
| ItemName | Sales | 1 |
+-----------------------------+-------------------------------+------+
| WorkSpaceName | TeamITO | 1 |
+-----------------------------+-------------------------------+------+
| DatasetName | Sales1 | 1 |
+-----------------------------+-------------------------------+------+
| ReportName | Sales1 | 1 |
+-----------------------------+-------------------------------+------+
| WorkspaceId | e8eaa0ca | 1 |
+-----------------------------+-------------------------------+------+
| ObjectId | Sales1 | 1 |
+-----------------------------+-------------------------------+------+
| DatasetId | 4c5d-ad45-eb6546 | 1 |
+-----------------------------+-------------------------------+------+
| ReportId | 4cb0-99ad-de41b5160c47 | 1 |
+-----------------------------+-------------------------------+------+
| IsSuccess | true | 3 |
+-----------------------------+-------------------------------+------+
| DatapoolRefreshScheduleType | None | 1 |
+-----------------------------+-------------------------------+------+
| DatapoolType | Undefined | 1 |
+-----------------------------+-------------------------------+------+
And even better, you can add a WITH
clause like here:
SELECT *
FROM OPENJSON(@YourString)
WITH
(
Id varchar(200) '$.Id',
RecordType int '$.RecordType',
CreationTime datetime '$.CreationTime'
--Add all your known columns here...
)
Doing so you get your values typed and side-by-side
+----------+------------+-------------------------+
| Id | RecordType | CreationTime |
+----------+------------+-------------------------+
| 44de2468 | 20 | 2018-08-03 12:30:34.000 |
+----------+------------+-------------------------+
Upvotes: 1
Reputation: 8101
It looks like you're dealing with a malformed JSON column here. Those double-double quotes are a bother.
If you can clean up the formatting, though, you can just use JSON functions in your query.
First, set up the data (using the data you provided in your other copy of this question,(Split column values into multiple columns):
DECLARE @t TABLE
(
RecordType NVARCHAR(20)
,AuditDate NVARCHAR(MAX)
);
INSERT @t
(
RecordType
,AuditDate
)
VALUES
('View', '{""Id"":""44de2468"",""Type"":20,""CreationDate"":""2018-08-23""}')
,('Edit', '{""Id"":""44de2467"",""Type"":40,""CreationDate"":""2018-08-24""}')
,('Print', '{""Id"":""44de2768"",""Type"":60,""CreationDate"":""2018-05-06""}')
,('Delete', '{""Id"":""44de2488"",""Type"":30,""CreationDate"":""2018-07-20""}');
Now, clean up the malformed JSONs by replacing the double-double quotes with single double quotes.
UPDATE @t
SET AuditDate = REPLACE(AuditDate,'""','"');
Validate the JSONs look good.
SELECT * FROM @t
--Results:
+------------+---------------------------------------------------------+
| RecordType | AuditDate |
+------------+---------------------------------------------------------+
| View | {"Id":"44de2468","Type":20,"CreationDate":"2018-08-23"} |
| Edit | {"Id":"44de2467","Type":40,"CreationDate":"2018-08-24"} |
| Print | {"Id":"44de2768","Type":60,"CreationDate":"2018-05-06"} |
| Delete | {"Id":"44de2488","Type":30,"CreationDate":"2018-07-20"} |
+------------+---------------------------------------------------------+
Then use JSON_VALUE()
to extract the parts you're interested in.
SELECT
RecordType
, JSON_VALUE(AuditDate, '$.Id') AS [Id]
, JSON_VALUE(AuditDate, '$.Type') AS [Type]
, JSON_VALUE(AuditDate, '$.CreationDate') AS CreationDate
FROM @t
--Results
+------------+----------+------+--------------+
| RecordType | Id | Type | CreationDate |
+------------+----------+------+--------------+
| View | 44de2468 | 20 | 2018-08-23 |
| Edit | 44de2467 | 40 | 2018-08-24 |
| Print | 44de2768 | 60 | 2018-05-06 |
| Delete | 44de2488 | 30 | 2018-07-20 |
+------------+----------+------+--------------+
Upvotes: 2
Reputation: 7918
This is pretty simple, all you need is a string "splitter" (AKA tokenizer). If you are on SQL 2016+ you can use STRING_SPLIT
; if you're on a pre-2016 system you can use DelimitedSplit8K on 2005+ or DelimitedSplit8K_LEAD on 2012+. The solution would look like this:
DECLARE @AuditDate VARCHAR(8000) =
'"{""Id"":""44de2468"",""RecordType"":20,""CreationTime"":""2018-08-03T12:30:34"",""Operation"":""ViewReport"",""OrganizationId"":""779558"",""UserType"":0,""UserKey"":""FFFA3DA"",""Workload"":""PowerBI"",""UserId"":""[email protected]"",""ClientIP"":""9.5.3.26"",""UserAgent"":""Mozilla\/5.0 (Windows NT 10.0;"",""Activity"":""ViewReport"",""ItemName"":""Sales"",""WorkSpaceName"":""TeamITO"",""DatasetName"":""Sales1"",""ReportName"":""Sales1"",""WorkspaceId"":""e8eaa0ca"",""ObjectId"":""Sales1"",""DatasetId"":""4c5d-ad45-eb6546"",""ReportId"":""4cb0-99ad-de41b5160c47"",""IsSuccess"":true,""DatapoolRefreshScheduleType"":""None"",""DatapoolType"":""Undefined""}"'
SELECT
Id = MAX(CASE split.attrib WHEN 'ID' THEN split.val END),
RecordType = MAX(CASE split.attrib WHEN 'RecordType' THEN split.val END),
CreationTime = MAX(CASE split.attrib WHEN 'CreationTime' THEN split.val END),
Operation = MAX(CASE split.attrib WHEN 'Operation' THEN split.val END),
OrganizationId = MAX(CASE split.attrib WHEN 'OrganizationId' THEN split.val END),
UserType = MAX(CASE split.attrib WHEN 'UserType' THEN split.val END)
FROM
(
SELECT attrib = REPLACE(REPLACE(SUBSTRING(split.value, 1, mid.point-1),'{',''),'"',''),
val = REPLACE(REPLACE(SUBSTRING(split.value, mid.point+1, 8000),'{',''),'"','')
FROM STRING_SPLIT(@AuditDate,',') AS split
CROSS APPLY (VALUES(CHARINDEX(':', split.value))) AS mid(point)
WHERE REPLACE(REPLACE(SUBSTRING(split.value, 1, mid.point-1),'{',''),'"','') IN
('id','RecordType','CreationTime','Operation','OrganizationID','UserType')
) AS split;
Results:
Id RecordType CreationTime Operation OrganizationId UserType
---------- ----------- --------------------- ----------- --------------- ---------
44de2468 20 2018-08-03T12:30:34 ViewReport 779558 0
Upvotes: 2