Reputation: 43

Azure Data Warehouse - Not using all available DWU's

My organisation is currently evaluating Azure Datawarehouse. We have a fact table that contains 16M rows and another that has 5M both of them are hash distributed on the same column (same data type and length).

When an inner join is performed at 200 DWU scale with smallrc resource class the query takes nearly 6 minutes, yet the DWU usage (as shown in portal) is only fraction of all available DWU's. Also, when the same query is initiated multiple times at the same time (through SSIS execute SQL tasks) the datawarehouse always executes 6 instances (maximum).

We have changed the DWU setting from 200 to 500 to 1000 and finally to 2000 but each time the maximum number of concurrent runs of the same query is limited to 6 only and the DWU usage always remains fraction of the total available, there is no noticeable change in performance either.

Is this expected behaviour? surely greatly increasing the DWU's should have provided massive decrease in query execution time? I appreciate that this is very generic but I am trying to understand if I have not considered something which is important for such an analysis?

Here is the Explain:

explain
<?xml version="1.0" encoding="utf-8"?>
<dsql_query number_nodes="10" number_distributions="60" number_distributions_per_node="6">
  <sql>select shifts.[Shift Reference], Shifts.[Trust Code], Shifts.[Ward Code], shifts.[Location Code], Qualification, YYYYMM, booking_removal.[Booking Type], booking_removal.[Booking Type], count(distinct booking_removal.[Booking ID])  from shifts join booking_removal on shifts.[Shift reference] = booking_removal.[Shift Reference] join dimdate on shifts.[Shift Start Date] = date where year in  (2015, 2016)
group by shifts.[Shift Reference], Shifts.[Trust Code], Shifts.[Ward Code], shifts.[Location Code], Qualification, YYYYMM, booking_removal.[Booking Type], booking_removal.[Booking Type]</sql>
  <dsql_operations total_cost="1.22805816" total_number_operations="5">
    <dsql_operation operation_type="RND_ID">
      <identifier>TEMP_ID_15134</identifier>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllComputeNodes" />
      <sql_operations>      
<sql_operation type="statement">CREATE TABLE [tempdb].[dbo].[TEMP_ID_15134] ([Date] DATE, [YYYYMM] INT ) WITH(DATA_COMPRESSION=PAGE);</sql_operation>
      </sql_operations>
    </dsql_operation>
    <dsql_operation operation_type="BROADCAST_MOVE">
      <operation_cost cost="1.22805816" accumulative_cost="1.22805816" average_rowsize="7" output_rows="730.987" GroupNumber="15" />                                                                                                                                    
<source_statement>SELECT [T1_1].[Date] AS [Date],
       [T1_1].[YYYYMM] AS [YYYYMM]
FROM   (SELECT [T2_1].[Date] AS [Date],
               [T2_1].[YYYYMM] AS [YYYYMM]
        FROM   [NHSP-Shifts-DW].[dbo].[DimDate] AS T2_1
        WHERE  (([T2_1].[Year] = CAST ((2015) AS INT))
                OR ([T2_1].[Year] = CAST ((2016) AS INT)))) AS T1_1</source_statement>
      <destination_table>[TEMP_ID_15134]</destination_table>
    </dsql_operation>
    <dsql_operation operation_type="RETURN">
      <location distribution="AllDistributions" />
<select>SELECT [T1_1].[Shift Reference] AS [Shift Reference],
       [T1_1].[Trust Code] AS [Trust Code],
       [T1_1].[Ward Code] AS [Ward Code],
       [T1_1].[Location Code] AS [Location Code],
       [T1_1].[Qualification] AS [Qualification],
       [T1_1].[YYYYMM] AS [YYYYMM],
       [T1_1].[Booking Type] AS [Booking Type],
       [T1_1].[Booking Type] AS [Booking Type1],
       [T1_1].[col] AS [col]                                                                                                        
FROM   (SELECT   COUNT([T2_1].[Booking ID]) AS [col],
                 [T2_1].[Shift Reference] AS [Shift Reference],
                 [T2_1].[Trust Code] AS [Trust Code],
                 [T2_1].[Ward Code] AS [Ward Code],
                 [T2_1].[Location Code] AS [Location Code],
                 [T2_1].[Qualification] AS [Qualification],
                 [T2_1].[YYYYMM] AS [YYYYMM],
                 [T2_1].[Booking Type] AS [Booking Type]                                                                   
FROM     (SELECT   [T3_1].[Booking ID] AS [Booking ID],
                           [T3_2].[Shift Reference] AS [Shift Reference],
                           [T3_2].[Trust Code] AS [Trust Code],
                           [T3_2].[Ward Code] AS [Ward Code],
                           [T3_2].[Location Code] AS [Location Code],
                           [T3_2].[Qualification] AS [Qualification],
                           [T3_2].[YYYYMM] AS [YYYYMM],
                           [T3_1].[Booking Type] AS [Booking Type]
FROM     [NHSP-Shifts-DW].[dbo].[Booking_Removal] AS T3_1
                           INNER JOIN
                           (SELECT [T4_2].[Shift Reference] AS [Shift Reference],
                                   [T4_2].[Trust Code] AS [Trust Code],
                                   [T4_2].[Ward Code] AS [Ward Code],
                                   [T4_2].[Location Code] AS [Location Code],
                                   [T4_2].[Qualification] AS [Qualification],
                                   [T4_1].[YYYYMM] AS [YYYYMM]
FROM   [tempdb].[dbo].[TEMP_ID_15134] AS T4_1
                                   INNER JOIN
                                   [NHSP-Shifts-DW].[dbo].[Shifts] AS T4_2
                                   ON ([T4_1].[Date] = [T4_2].[Shift Start Date])) AS T3_2 ON ([T3_1].[Shift Reference] = [T3_2].[Shift Reference])
                  GROUP BY [T3_2].[Shift Reference], [T3_2].[Trust Code], [T3_2].[Ward Code], [T3_2].[Location Code], [T3_2].[Qualification], [T3_2].[YYYYMM], [T3_1].[Booking Type], [T3_1].[Booking ID]) AS T2_1
GROUP BY [T2_1].[Shift Reference], [T2_1].[Trust Code], [T2_1].[Ward Code], [T2_1].[Location Code], [T2_1].[Qualification], [T2_1].[YYYYMM], [T2_1].[Booking Type]) AS T1_1</select>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllComputeNodes" />
      <sql_operations>
        <sql_operation type="statement">DROP TABLE [tempdb].[dbo].[TEMP_ID_15134]</sql_operation>
      </sql_operations>
    </dsql_operation>
  </dsql_operations>
</dsql_query>

When distributing the table I did check for data skew and the [shift reference] column almost provides none. Here are the results:

188648  123888  55888   8872    1   1
189096  123816  55960   9320    1   10
189352  123760  56000   9592    1   11
189488  124000  55960   9528    1   12
189096  123640  55984   9472    2   13
189544  123848  55952   9744    2   14
189024  123656  55952   9416    2   15
188960  123560  55984   9416    2   16
188256  123416  55944   8896    2   17
188840  123544  56000   9296    2   18
189096  123680  55960   9456    2   19
188840  123792  55936   9112    1   2
189288  123672  56000   9616    2   20
188648  123744  55976   8928    2   21
188840  123840  55976   9024    2   22
188776  123696  56032   9048    2   23
188392  123384  55944   9064    2   24
188384  123656  55976   8752    3   25
189096  123992  55968   9136    3   26
189032  123960  55968   9104    3   27
188968  123896  55960   9112    3   28
188776  123560  55976   9240    3   29
188840  123576  55960   9304    1   3
189024  123576  56000   9448    3   30
188768  123448  55920   9400    3   31
188896  123584  55992   9320    3   32
188960  123792  55984   9184    3   33
188456  123280  55992   9184    3   34
189736  124192  55976   9568    3   35
189288  123864  56024   9400    3   36
189168  123624  55976   9568    4   37
189096  123704  56016   9376    4   38
188968  123520  55992   9456    4   39
189672  123816  55952   9904    1   4
188904  123704  55944   9256    4   40
188712  123600  55976   9136    4   41
189160  123688  55976   9496    4   42
189416  124008  55976   9432    4   43
189032  123816  55952   9264    4   44
188968  123680  55984   9304    4   45
189352  123816  55992   9544    4   46
189416  124056  56008   9352    4   47
189608  123984  55992   9632    4   48
189160  123712  55968   9480    5   49
189224  123936  55928   9360    1   5
189480  123880  55968   9632    5   50
189480  123808  56032   9640    5   51
188840  123528  55968   9344    5   52
188960  123632  55936   9392    5   53
189024  123752  55952   9320    5   54
189216  123768  55976   9472    5   55
189600  123928  55960   9712    5   56
188456  123560  55912   8984    5   57
189664  124160  55976   9528    5   58
189096  123824  55976   9296    5   59
189288  123896  55960   9432    1   6
189096  123696  55936   9464    5   60
189224  123664  55952   9608    1   7
188776  123864  55920   8992    1   8
188712  123752  55920   9040    1   9

Upvotes: 1

Answers (2)

Sonya Marshall

Reputation: 506

How many rows is your query returning? If it's returning a lot of rows, the return operation on the client may be the bottleneck and thus scaling DWUs may not help. You mentioned that the currency is always 6. Could it be that this is an SSIS setting? In SQL DW the concurrency varies depending on the amount of DWUs allocated. This is explained in detail in the Concurrency and workload management in SQL Data Warehouse article. To see how many queries are actually running at once, you can run the query.

SELECT * FROM sys.dm_pdw_exec_requests WHERE status = 'Running';

Upvotes: 1

Rob Farley

Reputation: 15849

Your effort per distribution is mostly the same, which is good. Having extra nodes involved can increase the processing power, but the effort in handling a single distribution isn't going to be impacted much. If you were CPU-bound, then sure. If you were IO-bound, then sure. But your query is taking a long time because of the grouping you're doing, and so on. You could improve the query in various ways, but increasing the DWU won't help much, except to allow more processing power for other queries. If you want to tune your query to make it run faster in general, then that's a different kind of question.

Upvotes: 1

Azure Data Warehouse - Not using all available DWU&#39;s

Answers (2)

Related Questions

Azure Data Warehouse - Not using all available DWU's