Stinus
Stinus

Reputation: 329

Calculate total storage per column in table

I was trying to decide how much the total storage is in a big table per column. There are multiple nvarchar in it.

One column is nvarchar(max) and on importing, the text of an XML is put in it. After the record is processed correctly. The column is emptied again to an empty string.

In the SQL Server "Disk Usage by Top Tables" report, I see the following.

Amount Records: 1 808 604

Reserved (KB): 15 209 272

Data (KB): 14 466 776

Indexes (KB): 731 896

Unused (KB): 10 600

So I was searching on how to find where this enormous amount of data is in the table. Because the nvarchar(max) columns are almost all empty.

I did a sum(datalength(columnname)) on all the columns in that table. And made a sum of all the column values.

This gave me: 499 344 838 bytes = 0.499344838 gigabytes

So I'm wondering now:

Edit: I've done some tests with inserting data (see below for loop)

With this test, its not as much data difference, +- 23 vs 27 MB.

I will try to do a rebuild on the production table, but I will need to schedule it.

Edit2: I did a test with loop 1000 times: insert 100 + set xml='' where xml<>'' Result after that is 264.008KB data. If I do then a rebuild, it goes down to 22.944KB.

So it looks like the rebuild can offer a solution. But any idea on how intensive this can be on a production environment? And if its possible to detect it in my application if I need to execute it?

Table definition with indexes

CREATE TABLE [dbo].[BATCH](
    [BATCH_PID] [bigint] IDENTITY(1,1) NOT NULL,
    [XML_CREATION_DATE] [datetime] NULL,
    [BATCH_REFERENCE] [nvarchar](50) NULL,
    [SOURCE] [nvarchar](50) NULL,
    [DOCUMENT_CLASS_FID] [int] NULL,
    [XML_NAME] [nvarchar](150) NULL,
    [XML_TYPE] [int] NULL,
    [XML] [nvarchar](max) NULL,
    [NUM_OF_DOCUMENTS] [int] NULL,
    [NUM_OF_IMAGES] [int] NULL,
    [PRIORITY] [int] NULL,
    [STATUS] [int] NULL,
    [USER_FID] [int] NULL,
    [EXTENAL_USER] [nvarchar](50) NULL,
    [REMARKS] [nvarchar](max) NULL,
    [XML_PATH] [nvarchar](max) NULL,
    [BATCH_CREATION_DATE] [datetime] NULL,
    [BATCH_PROCESS_DATE] [datetime] NULL,
    [Action] [int] NULL,
    [IMPORT_LOCATION_FID] [bigint] NULL,
    [QUARANTINE_LOCATION_FID] [bigint] NULL,
    [QUARANTINE_DATE] [datetime] NULL,
    [QUARANTINE] [bit] NULL,
    [DOCS_ON_ERROR] [varchar](255) NULL,
    [CAPTURE_XML] [nvarchar](max) NULL,
    [IGNORE_PAC] [bit] NULL,
    [APPLICATION] [int] NULL,
    [EXTRA_INFO] [nvarchar](max) NULL,
    [INPUT_TEXT] [nvarchar](max) NULL,
    [PROCESS_TIME_BATCH] [int] NULL,
    [PROCESS_TIME_DOCUMENT] [int] NULL,
    [PROCESS_TIME_IMAGE] [int] NULL,
    [BATCH_SIZE] [int] NULL,
    [RULES] [nvarchar](1000) NULL,
    [KEEP_XML] [bit] NULL,
PRIMARY KEY CLUSTERED 
(
    [BATCH_PID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]


CREATE NONCLUSTERED INDEX [IDX_BATCH_Action] ON [dbo].[BATCH]
(
    [Action] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_BATCH_CREATION_DATE] ON [dbo].[BATCH]
(
    [BATCH_CREATION_DATE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_BATCH_SIZE] ON [dbo].[BATCH]
(
    [BATCH_SIZE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_DOCUMENT_CLASS_FID] ON [dbo].[BATCH]
(
    [DOCUMENT_CLASS_FID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_IMPORT_LOCATION_FID] ON [dbo].[BATCH]
(
    [IMPORT_LOCATION_FID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_PRIORITY] ON [dbo].[BATCH]
(
    [PRIORITY] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_PROCESS_TIME_BATCH] ON [dbo].[BATCH]
(
    [PROCESS_TIME_BATCH] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_PROCESS_TIME_DOCUMENT] ON [dbo].[BATCH]
(
    [PROCESS_TIME_DOCUMENT] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_PROCESS_TIME_IMAGE] ON [dbo].[BATCH]
(
    [PROCESS_TIME_IMAGE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_QUARANTINE] ON [dbo].[BATCH]
(
    [QUARANTINE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_QUARANTINE_LOCATION_FID] ON [dbo].[BATCH]
(
    [QUARANTINE_LOCATION_FID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
SET ANSI_PADDING ON
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_SOURCE] ON [dbo].[BATCH]
(
    [SOURCE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_STATUS] ON [dbo].[BATCH]
(
    [STATUS] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_USER_FID] ON [dbo].[BATCH]
(
    [USER_FID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
SET ANSI_PADDING ON
GO
CREATE NONCLUSTERED INDEX [IDX_BATCH_XML_NAME] ON [dbo].[BATCH]
(
    [XML_NAME] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

I've also created a simple loop to insert data

declare @numInserts int = 10000
declare @buildvarchar int = 10000

declare @maxvarchar nvarchar(max) = N'X'

begin

 while @buildvarchar > 0
 begin
  set @maxvarchar = @maxvarchar + N'X'
  set @buildvarchar = @buildvarchar - 1
 end


 while @numInserts > 0
 begin
INSERT [dbo].[BATCH] ([XML_CREATION_DATE], [BATCH_REFERENCE], [SOURCE], [DOCUMENT_CLASS_FID], [XML_NAME], [XML_TYPE], [XML], [NUM_OF_DOCUMENTS], [NUM_OF_IMAGES], [PRIORITY], [STATUS], [USER_FID], [EXTENAL_USER], [REMARKS], [XML_PATH], [BATCH_CREATION_DATE], [BATCH_PROCESS_DATE], [Action], [IMPORT_LOCATION_FID], [QUARANTINE_LOCATION_FID], [QUARANTINE_DATE], [QUARANTINE], [DOCS_ON_ERROR], [CAPTURE_XML], [IGNORE_PAC], [APPLICATION], [EXTRA_INFO], [INPUT_TEXT], [PROCESS_TIME_BATCH], [PROCESS_TIME_DOCUMENT], [PROCESS_TIME_IMAGE], [BATCH_SIZE], [RULES], [KEEP_XML])
VALUES (CAST(N'2017-09-21T14:56:46.000' AS DateTime), N'', N'iDesk', 1, N'21-09-2017-14-44-58-501574', 2, 
@maxvarchar, 0, 0, 1, 9, 1, N'', N'', N'D:\BaseDir\', CAST(N'2017-09-21T14:56:46.000' AS DateTime), CAST(N'2017-09-21T14:56:46.000' AS DateTime), 3, 1, 0, CAST(N'1900-01-01T00:00:00.000' AS DateTime), 0, N'1', NULL, NULL, 4, NULL, N'', 412, 0, 0, 0, N'', 0)

        set @numInserts = @numInserts - 1
    end
end

Upvotes: 4

Views: 1444

Answers (2)

KumarHarsh
KumarHarsh

Reputation: 5094

•Is it not possible to calculate the size of content with sum(datalength(..))

Create table with only one column and one value.

CREATE TABLE [dbo].[BATCH2] ([BATCH_PID] INT)

INSERT INTO [BATCH2] VALUES (1)



SELECT DATALENGTH(BATCH_PID) FROM BATCH2

This will return 4 ,as we know that INT Storage is 4 byte.

If we use, sp_spaceused BATCH2

You will notice data column is 8KB. Why ?

As we know data is store in Page, and each page occupy 8KB.

so in our example even one row will need 1 Page i.e 8KB.

So datalength will give you size data .

•Is that calculation correct, and where is the other +- 14GB used in the report?

If we insert 10000 rows at first time,then say delete / update some rows of varies length ,then sp_spaceused BATCH2 may not return correct value because Memory is not reclaim with Insert/Update/Delete.When we delete some rows or update some rows then there may be some unused memory in Pages or there may me more pages because of large update.

But it won't reflect immediately.

So from time to time we need to Rebuild Index specially table with so many indexes.

So once we Rebuild indexes, Index are orgainise and data are organise in data pages and index pages, lost memory is reclaim.

ALTER INDEX ALL ON [dbo].[BATCH] reorganize

So sp_spaceused BATCH will give correct data now.

Upvotes: 2

Stinus
Stinus

Reputation: 329

With the following statement, I managed to decrease the table size with 27GB.

ALTER TABLE batch REBUILD WITH (ONLINE = OFF)

Looks like SQL isn’t re-using the space automatically.

I did not find a solution on how to detect if it will clean something in advance. The sum datalength query difference was like 400MB difference.

Upvotes: 2

Related Questions