Reputation: 329
I was trying to decide how much the total storage is in a big table per column. There are multiple nvarchar in it.
One column is nvarchar(max) and on importing, the text of an XML is put in it. After the record is processed correctly. The column is emptied again to an empty string.
In the SQL Server "Disk Usage by Top Tables" report, I see the following.
Amount Records: 1 808 604
Reserved (KB): 15 209 272
Data (KB): 14 466 776
Indexes (KB): 731 896
Unused (KB): 10 600
So I was searching on how to find where this enormous amount of data is in the table. Because the nvarchar(max) columns are almost all empty.
I did a sum(datalength(columnname)) on all the columns in that table. And made a sum of all the column values.
This gave me: 499 344 838 bytes = 0.499344838 gigabytes
So I'm wondering now:
Is it not possible to calculate the size of content with sum(datalength(..))
Is that calculation correct, and where is the other +- 14GB used in the report?
Edit: I've done some tests with inserting data (see below for loop)
ALTER TABLE batch REBUILD WITH (ONLINE = OFF) --> 22 928KB data
DROP & CREATE table
Insert 100K with the XML column '' instead of the nvarchar variable --> 22.864KB data
With this test, its not as much data difference, +- 23 vs 27 MB.
I will try to do a rebuild on the production table, but I will need to schedule it.
Edit2: I did a test with loop 1000 times: insert 100 + set xml='' where xml<>'' Result after that is 264.008KB data. If I do then a rebuild, it goes down to 22.944KB.
So it looks like the rebuild can offer a solution. But any idea on how intensive this can be on a production environment? And if its possible to detect it in my application if I need to execute it?
Table definition with indexes
CREATE TABLE [dbo].[BATCH](
[BATCH_PID] [bigint] IDENTITY(1,1) NOT NULL,
[XML_CREATION_DATE] [datetime] NULL,
[BATCH_REFERENCE] [nvarchar](50) NULL,
[SOURCE] [nvarchar](50) NULL,
[DOCUMENT_CLASS_FID] [int] NULL,
[XML_NAME] [nvarchar](150) NULL,
[XML_TYPE] [int] NULL,
[XML] [nvarchar](max) NULL,
[NUM_OF_DOCUMENTS] [int] NULL,
[NUM_OF_IMAGES] [int] NULL,
[PRIORITY] [int] NULL,
[STATUS] [int] NULL,
[USER_FID] [int] NULL,
[EXTENAL_USER] [nvarchar](50) NULL,
[REMARKS] [nvarchar](max) NULL,
[XML_PATH] [nvarchar](max) NULL,
[BATCH_CREATION_DATE] [datetime] NULL,
[BATCH_PROCESS_DATE] [datetime] NULL,
[Action] [int] NULL,
[IMPORT_LOCATION_FID] [bigint] NULL,
[QUARANTINE_LOCATION_FID] [bigint] NULL,
[QUARANTINE_DATE] [datetime] NULL,
[QUARANTINE] [bit] NULL,
[DOCS_ON_ERROR] [varchar](255) NULL,
[CAPTURE_XML] [nvarchar](max) NULL,
[IGNORE_PAC] [bit] NULL,
[APPLICATION] [int] NULL,
[EXTRA_INFO] [nvarchar](max) NULL,
[INPUT_TEXT] [nvarchar](max) NULL,
[PROCESS_TIME_BATCH] [int] NULL,
[PROCESS_TIME_DOCUMENT] [int] NULL,
[PROCESS_TIME_IMAGE] [int] NULL,
[BATCH_SIZE] [int] NULL,
[RULES] [nvarchar](1000) NULL,
[KEEP_XML] [bit] NULL,
PRIMARY KEY CLUSTERED
(
[BATCH_PID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IDX_BATCH_Action] ON [dbo].[BATCH]
(
[Action] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_BATCH_CREATION_DATE] ON [dbo].[BATCH]
(
[BATCH_CREATION_DATE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_BATCH_SIZE] ON [dbo].[BATCH]
(
[BATCH_SIZE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_DOCUMENT_CLASS_FID] ON [dbo].[BATCH]
(
[DOCUMENT_CLASS_FID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_IMPORT_LOCATION_FID] ON [dbo].[BATCH]
(
[IMPORT_LOCATION_FID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_PRIORITY] ON [dbo].[BATCH]
(
[PRIORITY] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_PROCESS_TIME_BATCH] ON [dbo].[BATCH]
(
[PROCESS_TIME_BATCH] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_PROCESS_TIME_DOCUMENT] ON [dbo].[BATCH]
(
[PROCESS_TIME_DOCUMENT] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_PROCESS_TIME_IMAGE] ON [dbo].[BATCH]
(
[PROCESS_TIME_IMAGE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_QUARANTINE] ON [dbo].[BATCH]
(
[QUARANTINE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_QUARANTINE_LOCATION_FID] ON [dbo].[BATCH]
(
[QUARANTINE_LOCATION_FID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
SET ANSI_PADDING ON
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_SOURCE] ON [dbo].[BATCH]
(
[SOURCE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_STATUS] ON [dbo].[BATCH]
(
[STATUS] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_USER_FID] ON [dbo].[BATCH]
(
[USER_FID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
SET ANSI_PADDING ON
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_XML_NAME] ON [dbo].[BATCH]
(
[XML_NAME] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
I've also created a simple loop to insert data
declare @numInserts int = 10000
declare @buildvarchar int = 10000
declare @maxvarchar nvarchar(max) = N'X'
begin
while @buildvarchar > 0
begin
set @maxvarchar = @maxvarchar + N'X'
set @buildvarchar = @buildvarchar - 1
end
while @numInserts > 0
begin
INSERT [dbo].[BATCH] ([XML_CREATION_DATE], [BATCH_REFERENCE], [SOURCE], [DOCUMENT_CLASS_FID], [XML_NAME], [XML_TYPE], [XML], [NUM_OF_DOCUMENTS], [NUM_OF_IMAGES], [PRIORITY], [STATUS], [USER_FID], [EXTENAL_USER], [REMARKS], [XML_PATH], [BATCH_CREATION_DATE], [BATCH_PROCESS_DATE], [Action], [IMPORT_LOCATION_FID], [QUARANTINE_LOCATION_FID], [QUARANTINE_DATE], [QUARANTINE], [DOCS_ON_ERROR], [CAPTURE_XML], [IGNORE_PAC], [APPLICATION], [EXTRA_INFO], [INPUT_TEXT], [PROCESS_TIME_BATCH], [PROCESS_TIME_DOCUMENT], [PROCESS_TIME_IMAGE], [BATCH_SIZE], [RULES], [KEEP_XML])
VALUES (CAST(N'2017-09-21T14:56:46.000' AS DateTime), N'', N'iDesk', 1, N'21-09-2017-14-44-58-501574', 2,
@maxvarchar, 0, 0, 1, 9, 1, N'', N'', N'D:\BaseDir\', CAST(N'2017-09-21T14:56:46.000' AS DateTime), CAST(N'2017-09-21T14:56:46.000' AS DateTime), 3, 1, 0, CAST(N'1900-01-01T00:00:00.000' AS DateTime), 0, N'1', NULL, NULL, 4, NULL, N'', 412, 0, 0, 0, N'', 0)
set @numInserts = @numInserts - 1
end
end
Upvotes: 4
Views: 1444
Reputation: 5094
•Is it not possible to calculate the size of content with sum(datalength(..))
Create table with only one column and one value.
CREATE TABLE [dbo].[BATCH2] ([BATCH_PID] INT)
INSERT INTO [BATCH2] VALUES (1)
SELECT DATALENGTH(BATCH_PID) FROM BATCH2
This will return 4 ,as we know that INT Storage
is 4 byte
.
If we use, sp_spaceused BATCH2
You will notice data column is 8KB. Why ?
As we know data is store in Page, and each page occupy 8KB.
so in our example even one row will need 1 Page i.e 8KB.
So datalength will give you size data .
•Is that calculation correct, and where is the other +- 14GB used in the report?
If we insert 10000 rows at first time,then say delete / update some rows of varies length ,then sp_spaceused BATCH2
may not return correct value because Memory is not reclaim with Insert/Update/Delete
.When we delete some rows or update some rows then there may be some unused memory in Pages or there may me more pages because of large update.
But it won't reflect immediately.
So from time to time we need to Rebuild Index
specially table with so many indexes.
So once we Rebuild indexes, Index are orgainise and data are organise in data pages and index pages, lost memory is reclaim.
ALTER INDEX ALL ON [dbo].[BATCH] reorganize
So sp_spaceused BATCH
will give correct data now.
Upvotes: 2
Reputation: 329
With the following statement, I managed to decrease the table size with 27GB.
ALTER TABLE batch REBUILD WITH (ONLINE = OFF)
Looks like SQL isn’t re-using the space automatically.
I did not find a solution on how to detect if it will clean something in advance. The sum datalength query difference was like 400MB difference.
Upvotes: 2