cuongle
cuongle

Reputation: 75306

Long running Stored procedure without keeping connection opened to Azure database

We have very long running stored procedure doing ETL work in order to load data from raw table into star schema (Fact - Dimensions) in Azure database.

This stored procedure takes around 10 hours to 20 hours running over 10 million rows (using MERGE statement).

At the moment, we run stored procedure from C# code (ADO.NET) with keeping CommandTimeout = 0 (forever). But sometime the connection is dropped as the connection to Azure database is unstable.

Is it possible to run stored procedure on database level without keeping connection opened all the time, and then log the process of Stored procedure in the Progress table to track down the progress?

I see some recommendations:

  1. Agent Job, seems not possible on Azure database as it does not support at the moment.

  2. SqlCommand.BeginExecuteNonQuery: I am not sure 100% BeginExecuteNonQuery still keeps connection opened under the hood or not.

Is there any other way to do this?

Upvotes: 5

Views: 1795

Answers (2)

wBob
wBob

Reputation: 14389

Azure Data Factory has a Stored Procedure task which could do this. It has a timeout property in the policy section which is optional. If you leave it out, it defaults to infinite:

"policy": {
           "concurrency": 1,
           "retry": 3
           },

If you specify the timeout as 0 when creating the activity, you'll see it disappear when you provision the task in the portal. You could also try specify the timeout at 1 day (24 hours), eg "timeout": "1.00:00:00", although I haven't tested it times out correctly.

You could also set the timeout to 0 in the connection string although again I haven't tested this option, eg

{
  "name": "AzureSqlLinkedService",
  "properties": {
    "type": "AzureSqlDatabase",
    "typeProperties": {
      "connectionString": "Server=tcp:<servername>.database.windows.net,1433;Database=<databasename>;User ID=<username>@<servername>;Password=<password>;Trusted_Connection=False;Encrypt=True;Connection Timeout=0"
    }
  }
}

I would regard this as more straightforward than Azure Automation but that's a personal choice. Maybe try both options and see which works best for you.

I agree with some of the other comments being made about the MERGE taking too long for that volume of records. I suspect either your table does not have appropriate indexing to support the MERGE or you're either running too low a service tier. What service tier are you running on, eg Basic,Standard, Premium (P1-P15). Consider raising a separate question with the DDL of your table including indexes and some sample data, the MERGE statement and service tier, I'm sure that can go faster.

As a test / quick fix, you could always refactor the MERGE as the appropriate INSERT / UPDATE / DELETE - I bet it goes faster. Let us know.

The connection between Azure Data Factory and Azure database should be stable. If it isn't you can raise support tickets. However for cloud architecture (and really any architecture) you need to make good design decisions which allow for the possibility of things going wrong. That means architecturally, you have to design for the possibility of the connection dropping, or the job failing. Example is make sure your job is restartable from the point of failure, make sure the error reporting is good etc.

Also, from experience, given your data volumes (which I regard as low), this job is taking far too long. There must be an issue with it or the design. It is my strongest recommendation that you attempt to resolve this issue.

Upvotes: 3

Related Questions