Radu Cotofana
Radu Cotofana

Reputation: 117

Couchbase NodeUnavailableException in .NET SDK

We are encountering this exception very often in our production code without any increase in number of requests to Couchbase or any memory pressure on the server itself. The node has been allocated 30GB of RAM and the usage is of 3GB maximum but every now and then this exception is being thrown. The bucket is opened only once per application lifetime and only get and upsert operations are performed afterwards. The connection is initialised like this:

Config = new ClientConfiguration()
{
    Servers = serverList,

    UseSsl = false,
    DefaultOperationLifespan = 2500,
    BucketConfigs = new Dictionary<string, BucketConfiguration>
    {
        { bucketName, new BucketConfiguration
        {
            BucketName = bucketName,
            UseSsl = false,
            DefaultOperationLifespan = 2500,
            PoolConfiguration = new PoolConfiguration
            {
            MaxSize = 2000,
            MinSize = 200,
            SendTimeout = (int)Configuration.Config.Instance.CouchbaseConfig.Timeout
            }
    }}
    }
};

Cluster = new Cluster(Config);
Bucket = Cluster.OpenBucket();

Can you please let me know if this initialisation is correct and more importantly what to check on the Couchbase server to find the cause of this issue? I have checked all logs on the server but could not find anything special at the time when those errors are being thrown.

Thank you,

Stacktrace:

System.Exception.Couchbase exception
at ###.DataLayer.Couchbase.CouchbaseUserOperations.Get()
at ###.API.Services.BaseService`1.SetUserID()
at ###.API.Services.EventsService+<GetResponse>d__0.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start()
at ###.API.Services.EventsService.GetResponse()
at ###.API.Services.BaseService`1+<Any>d__28.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start()
at ###.API.Services.BaseService`1.Any()
at lambda_method()
at ServiceStack.Host.ServiceRunner`1.Execute()
at ServiceStack.Host.ServiceRunner`1.Process()
at ServiceStack.Host.ServiceExec`1.Execute()
at ServiceStack.Host.ServiceRequestExec`2.Execute()
at ServiceStack.Host.ServiceController.ManagedServiceExec()
at ServiceStack.Host.ServiceController+<>c__DisplayClass11.<RegisterServiceExecutor>b__f()
at ServiceStack.Host.ServiceController.Execute()
at ServiceStack.HostContext.ExecuteService()
at ServiceStack.Host.RestHandler.ProcessRequestAsync()
at ServiceStack.Host.Handlers.HttpAsyncTaskHandler.System.Web.IHttpAsyncHandler.BeginProcessRequest()
at System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStep()
at System.Web.HttpApplication+PipelineStepManager.ResumeSteps()
at System.Web.HttpApplication.BeginProcessRequestNotification()
at System.Web.HttpRuntime.ProcessRequestNotificationPrivate()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification()
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion()
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification()
Caused by: System.Exception : Couchbase.Core.NodeUnavailableException: The node 172.31.34.105:11210 that the key was mapped to is either down or unreachable. The SDK will continue to try to connect every 1000ms. Until it can connect every operation routed to it will fail with this exception.
at ###.DataLayer.Couchbase.CouchbaseUserOperations.Get()
at ###.API.Services.BaseService`1.SetUserID()
at ###.API.Services.EventsService+<GetResponse>d__0.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start()
at ###.API.Services.EventsService.GetResponse()
at ###.API.Services.BaseService`1+<Any>d__28.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start()
at ###.API.Services.BaseService`1.Any()
at lambda_method()
at ServiceStack.Host.ServiceRunner`1.Execute()
at ServiceStack.Host.ServiceRunner`1.Process()
at ServiceStack.Host.ServiceExec`1.Execute()
at ServiceStack.Host.ServiceRequestExec`2.Execute()
at ServiceStack.Host.ServiceController.ManagedServiceExec()
at ServiceStack.Host.ServiceController+<>c__DisplayClass11.<RegisterServiceExecutor>b__f()
at ServiceStack.Host.ServiceController.Execute()
at ServiceStack.HostContext.ExecuteService()
at ServiceStack.Host.RestHandler.ProcessRequestAsync()
at ServiceStack.Host.Handlers.HttpAsyncTaskHandler.System.Web.IHttpAsyncHandler.BeginProcessRequest()
at System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStep()
at System.Web.HttpApplication+PipelineStepManager.ResumeSteps()
at System.Web.HttpApplication.BeginProcessRequestNotification()
at System.Web.HttpRuntime.ProcessRequestNotificationPrivate()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification()
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion()
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification()

Upvotes: 2

Views: 1644

Answers (2)

Radu Cotofana
Radu Cotofana

Reputation: 117

Even though the problem was not fully solved since we still encounter timeouts but at a lower rate, we increased the performance by using the ClusterHelper singleton instance as follows:

 ClusterHelper.Initialize(
            new ClientConfiguration
            {
                Servers = serverList,
                UseSsl = false,
                DefaultOperationLifespan = 2500,
                EnableTcpKeepAlives = true,
                TcpKeepAliveTime = 1000*60*60,
                TcpKeepAliveInterval = 5000,
                BucketConfigs = new Dictionary<string, BucketConfiguration>
                {
                    {
                        "default",
                        new BucketConfiguration
                        {
                            BucketName = "default",
                            UseSsl = false,
                            Password = "",
                            PoolConfiguration = new PoolConfiguration
                            {
                                MaxSize = 50,
                                MinSize = 10
                            }
                        }
                    }
                }
            });

Upvotes: 0

jeffrymorris
jeffrymorris

Reputation: 464

A NodeUnavailableException could be returned for any number of network related issues...However, since you mentioned you are running on AWS, it's likely the TCP keep-alives settings needs to be tuned on the client.

Your MinSize connections (200) is so large, that you are not likely using them all and they are sitting by idly until the AWS LB decides to shut them down. When this happens the SDK will temporarily put the node (1000ms) that failed into a down state and then try to reconnect. During this time any keys mapped to it will fail with that exception.

This blog describes how to set the TCP keep-alives time and interval: http://blog.couchbase.com/introducing-couchbase-.net-sdk-2.1.0-the-asynchronous-couchbase-.net-client

var config = new ClientConfiguration
{
    EnableTcpKeepAlives = true, //default it true
    TcpKeepAliveTime = 1000*60*60, //set to 60mins
    TcpKeepAliveInterval = 5000 //KEEP ALIVE will be sent every 5 seconds  after 1hr
};
var cluster = new Cluster(config);
var bucket = cluster.OpenBucket();

That assumes you are using version 2.1.0 or greater of the client. If you are not, you can do it through the ServicePointManager:

//setting keep-alive time to 200 seconds
ServicePointManager.SetTcpKeepAlive(true, 200000, 1000); 

You'll have to set that that to a value less than what the AWS LB is set to (I believe it's 60 seconds).

You should also probably set your connection pool min and max a bit lower, like 5 and 10.

Upvotes: 2

Related Questions