Reputation: 183
We are transitioning across to Azure SQL Data Warehouse - and an issue that's been highlighted is the need to change some smaller tables from Round-Robin / Hash-distributed to Replicated to improve performance.
MS Design Guidance (See Here) suggests one criteria for this decision is Tables that take up less than 2Gb Disk Space. i.e. these tables could be made into Replicated tables. They suggest using DBCC PDW_SHOWSPACEUSED to determine this.
I can run this against the whole DB, or one specific table, but i'd really like to get a list of all tables and the space used (preferably in MB) - but it's beyond me.
A lot of google searching either gives me the two basic commands I already know (against the whole DB / against 1 table) or give me SQL Server queries that don't run against Azure DW - e.g. using sys.allocation_units - which is not supported in Azure DW.
Upvotes: 3
Views: 1807
Reputation: 2569
I was just directed to this Microsoft article that provides a pretty solid solution to this problem.
In particular, create a view:
CREATE VIEW dbo.vTableSizes
AS
WITH base
AS (
SELECT
GETDATE() AS [execution_time],
DB_NAME() AS [database_name],
s.name AS [schema_name],
t.name AS [table_name],
QUOTENAME(s.name) + '.' + QUOTENAME(t.name) AS [two_part_name],
nt.[name] AS [node_table_name],
ROW_NUMBER() OVER (PARTITION BY
nt.[name]
ORDER BY
(
SELECT
NULL
)
) AS [node_table_name_seq],
tp.[distribution_policy_desc] AS [distribution_policy_name],
c.[name] AS [distribution_column],
nt.[distribution_id] AS [distribution_id],
i.[type] AS [index_type],
i.[type_desc] AS [index_type_desc],
nt.[pdw_node_id] AS [pdw_node_id],
pn.[type] AS [pdw_node_type],
pn.[name] AS [pdw_node_name],
di.name AS [dist_name],
di.position AS [dist_position],
nps.[partition_number] AS [partition_nmbr],
nps.[reserved_page_count] AS [reserved_space_page_count],
nps.[reserved_page_count] - nps.[used_page_count] AS [unused_space_page_count],
nps.[in_row_data_page_count] + nps.[row_overflow_used_page_count] + nps.[lob_used_page_count] AS [data_space_page_count],
nps.[reserved_page_count] - (nps.[reserved_page_count] - nps.[used_page_count])
- ([in_row_data_page_count] + [row_overflow_used_page_count] + [lob_used_page_count]) AS [index_space_page_count],
nps.[row_count] AS [row_count]
FROM
sys.schemas s
INNER JOIN
sys.tables t
ON s.[schema_id] = t.[schema_id]
INNER JOIN
sys.indexes i
ON t.[object_id] = i.[object_id]
AND i.[index_id] <= 1
INNER JOIN
sys.pdw_table_distribution_properties tp
ON t.[object_id] = tp.[object_id]
INNER JOIN
sys.pdw_table_mappings tm
ON t.[object_id] = tm.[object_id]
INNER JOIN
sys.pdw_nodes_tables nt
ON tm.[physical_name] = nt.[name]
INNER JOIN
sys.dm_pdw_nodes pn
ON nt.[pdw_node_id] = pn.[pdw_node_id]
INNER JOIN
sys.pdw_distributions di
ON nt.[distribution_id] = di.[distribution_id]
INNER JOIN
sys.dm_pdw_nodes_db_partition_stats nps
ON nt.[object_id] = nps.[object_id]
AND nt.[pdw_node_id] = nps.[pdw_node_id]
AND nt.[distribution_id] = nps.[distribution_id]
AND i.[index_id] = nps.[index_id]
LEFT OUTER JOIN
(
SELECT
*
FROM
sys.pdw_column_distribution_properties
WHERE
distribution_ordinal = 1
) cdp
ON t.[object_id] = cdp.[object_id]
LEFT OUTER JOIN
sys.columns c
ON cdp.[object_id] = c.[object_id]
AND cdp.[column_id] = c.[column_id]
WHERE
pn.[type] = 'COMPUTE'),
size
AS ( SELECT
[execution_time],
[database_name],
[schema_name],
[table_name],
[two_part_name],
[node_table_name],
[node_table_name_seq],
[distribution_policy_name],
[distribution_column],
[distribution_id],
[index_type],
[index_type_desc],
[pdw_node_id],
[pdw_node_type],
[pdw_node_name],
[dist_name],
[dist_position],
[partition_nmbr],
[reserved_space_page_count],
[unused_space_page_count],
[data_space_page_count],
[index_space_page_count],
[row_count],
([reserved_space_page_count] * 8.0) AS [reserved_space_KB],
([reserved_space_page_count] * 8.0) / 1000 AS [reserved_space_MB],
([reserved_space_page_count] * 8.0) / 1000000 AS [reserved_space_GB],
([reserved_space_page_count] * 8.0) / 1000000000 AS [reserved_space_TB],
([unused_space_page_count] * 8.0) AS [unused_space_KB],
([unused_space_page_count] * 8.0) / 1000 AS [unused_space_MB],
([unused_space_page_count] * 8.0) / 1000000 AS [unused_space_GB],
([unused_space_page_count] * 8.0) / 1000000000 AS [unused_space_TB],
([data_space_page_count] * 8.0) AS [data_space_KB],
([data_space_page_count] * 8.0) / 1000 AS [data_space_MB],
([data_space_page_count] * 8.0) / 1000000 AS [data_space_GB],
([data_space_page_count] * 8.0) / 1000000000 AS [data_space_TB],
([index_space_page_count] * 8.0) AS [index_space_KB],
([index_space_page_count] * 8.0) / 1000 AS [index_space_MB],
([index_space_page_count] * 8.0) / 1000000 AS [index_space_GB],
([index_space_page_count] * 8.0) / 1000000000 AS [index_space_TB]
FROM
base)
SELECT
*
FROM
size;
Then, this "Table space summary" query provides a list of tables and how much space each is currently using (among other information):
SELECT
database_name,
schema_name,
table_name,
distribution_policy_name,
distribution_column,
index_type_desc,
COUNT(DISTINCT partition_nmbr) AS nbr_partitions,
SUM(row_count) AS table_row_count,
SUM(reserved_space_GB) AS table_reserved_space_GB,
SUM(data_space_GB) AS table_data_space_GB,
SUM(index_space_GB) AS table_index_space_GB,
SUM(unused_space_GB) AS table_unused_space_GB
FROM
dbo.vTableSizes
GROUP BY
database_name,
schema_name,
table_name,
distribution_policy_name,
distribution_column,
index_type_desc
ORDER BY
table_reserved_space_GB DESC;
Here's a sample execution against one of my databases showing the table names sorted by space used.
Upvotes: 0