timgeudens
timgeudens

Reputation: 65

Random connection errors to MS SQL from nodeJS app

We have an AWS server running some nodeJS services. The services connecting to MS sql are randomly crashing with message "Failed to connect to databaseserver:1433 - Could not connect (sequence)".

We are running on:

App server: Linux Ubuntu 14.4 AWS m5 NodeJS: 8.11.2 Services are using package mssql latest version (4.3.0). This includes tedious 2.7.1.

DB server: Windows server 2012. sql server 2012

throughput: about 300 rpm, error also happens when throughput is lower (about 20 rpm). App is running in a cluster through PM2 (runs 4 times). We see the error happening on all 4 at the same time, but sometimes also on 1 or 2 instances.

What we tried:

Connection on app startup:

    App.sqlConnection = new App.SQL.ConnectionPool(config, function(err) {
            if(err){
                    Log.error(err);
                    process.exit(1);
            }

    App.sqlConnection.on('error', err => {
        Log.error(`There was a connection err : ${err}`);

        process.exit(1);
            });
    });

request;

var request = new App.SQL.Request(App.sqlConnection);
request.query(sQuery, function(err,results)
{
});

Errors are catched by the "on error" handler.

The error happens randomly across services. Some have more instances of the error then others. We are running out of options. Any idea if we can see more detailed errors?

Upvotes: 1

Views: 422

Answers (1)

Elliot Nelson
Elliot Nelson

Reputation: 11557

I have a couple suggestions.

First, how sure are you that these errors are actually a problem? If your code simply retries, instead of exiting, are the connections stable afterwards, or can a connection drop in the middle of a query?

(Connections dropping in the middle of queries are obviously not good, but random failures on connection, that can be fixed by retries, are the best kind of problem to have IMHO.)

Ignoring the potential in-code fix, I'm wondering when you say you "duplicated server to new machine" - did you launch a new AMI using latest Windows Server 2012, or did you image and clone? If your database server is a couple years old, you might actually be running outdated network drivers in your instance, which could give you some hiccups.

If you wanted to explore that, you could attempt rebuilding the entire database server from scratch on a newly launched AMI. Alternately you can upgrade PV driver, network adapter, and EC2Config on your existing instance, you can find the instructions at the following links:

https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/Upgrading_PV_drivers.html#aws-pv-upgrade

https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/sriov-networking.html#enable-enhanced-networking

https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/UsingConfig_Install.html

Upvotes: 0

Related Questions