lord.fist
lord.fist

Reputation: 435

Datastax C# driver 3.3.0 deadlocking on connect to cluster?

To Datastax C# driver engineers:

C# driver 3.3.0 is deadlocking while calling to Connect(). The following code snippet on Windows Forms will deadlock trying to connect:

    public void SimpleConnectTest()
    {
        const string ip = "127.0.0.1";
        const string keyspace = "somekeyspace";

        QueryOptions queryOptions = new QueryOptions();
        queryOptions.SetConsistencyLevel(ConsistencyLevel.One);

        Cluster cluster = Cluster.Builder()
            .AddContactPoints(ip)
            .WithQueryOptions(queryOptions)
            .Build();

        var cassandraSession = cluster.Connect(keyspace);

        Assert.AreNotEqual(null, cassandraSession);

        cluster.Dispose();
    }

Deadlocking happens here:

Cluster.cs -> 
private void Init()
{
  ...
TaskHelper.WaitToComplete(_controlConnection.Init(), initialAbortTimeout);
  ...
}

I have tested this on Cassandra 3.9.0, CQL spec 3.4.2 on local machine.

Everything deadlocks on calling this method _controlConnection.Init() here:

task = Id = 11, Status = WaitingForActivation, Method = "{null}", Result = "{Not yet computed}"

This then just runs for 30000ms and throws this:

                throw new TimeoutException(
                    "Cluster initialization was aborted after timing out. This mechanism is put in place to" +
                    " avoid blocking the calling thread forever. This usually caused by a networking issue" +
                    " between the client driver instance and the cluster.", ex);

Running same test on 3.2.0 has no such problems. Can anyone else test this? Maybe this just happens to me.

Edit:

Here is the screenshot for the deadlock:

Deadlocked tasks with blocking awaiting ()

Upvotes: 2

Views: 818

Answers (3)

jorgebg
jorgebg

Reputation: 6600

Thanks to the details in your comments, we were able to identify the underlying issue.

Similar to what was proposed by Luke, there were some missing ConfigureAwait() calls.

This issue impacts users that are calling Cluster.Connect() on environments with SynchonizationContext which is not a common use case:

  • For Windows Forms, its unlikely to communicate directly to a database (without a service in the middle). Furthermore, users should call Connect() before creating a form (where there is no SynchonizationContext) to share the same Session instance across all forms.
  • For ASP.NET, users should call Connect() outside of any endpoint action, before the HttpContext is created (where there is no SynchonizationContext).

Note that this issue affects only Connect() calls. Other blocking calls like Execute() don't have this issue.

In any case, this issue could be a showstopper for users getting started with the driver, for example, users creating a simple windows forms app to try a concept.

I've submitted a pull request with the fix, which also contains a test that looks into the source code for the usage of await without ConfigureAwait() calls to avoid having this issue in the future: https://github.com/datastax/csharp-driver/pull/309

You can expect the fix to land in the next patch release.

Upvotes: 2

lord.fist
lord.fist

Reputation: 435

Issue has been opened here with workaround:

https://datastax-oss.atlassian.net/projects/CSHARP/issues/CSHARP-579

For anyone experiencing the same - just wrap your connection code into a new task.

Task.Run(() =>
    {
        SimpleConnectTest();
    });

Upvotes: 0

Luke Tillman
Luke Tillman

Reputation: 1385

I can't reproduce the problem, but I suspect the problem might be with a recent change to make the connection process asynchronous internally. I don't know for sure, but tracing through the Connect code, I suspect it might be a missing ConfigureAwait(false). In particular, it looks like the Reconnect method (which could definitely get hit as part of that Init code path) is missing one after that commit. It's possible that I'm not able to reproduce it because I'm not hitting the Reconnect code path while for some reason you are in your environment.

I'm not 100% sure that's the culprit, but I opened a PR to fix it. Stephen Cleary wrote a great explanation on why this can happen in Forms/Web apps. You could try building the driver from my fork to see if that change fixes the problem, or wait and see what happens with the PR and a new release. If it's still happening, I'd suggest opening an issue on the JIRA.

Hope that helps!

Upvotes: 1

Related Questions