Vinicius Bass
Vinicius Bass

Reputation: 115

Why is connecting to AWS keyspaces so slow with python's cassandra-driver?

I have one API, is an flask application with python deployed on AWS EC2. Some endpoints need to connect on AWS Keyspace for make a query. But the method cluster.connect() is too slow, takes 5 seconds for connect and then run the query.

What I did to solve it, was to start a connection when the application starts (when a commit is done on the master branch, I'm using CodePipeline), and then the connection is open all the time.

I didn't find anything in the python cassandra driver documentation against this, is there any potential problem with this solution that I found?

Upvotes: 2

Views: 659

Answers (2)

MikeJPR
MikeJPR

Reputation: 812

Could you provide the current connection configuration?

Amazon Keyspaces uses Transport Layer Security (TLS) communication by default. If your not providing the cert on connection, adding it could help speed things up. For a complete example check out Keyspaces Python Sample

You can also try disabling the following options which should provide quicker times for initial connection.

schema_metadata_enabled = False
token_metadata_enabled = False 

Python Driver Documentation

    from cassandra.cluster import Cluster
    from ssl import SSLContext, PROTOCOL_TLSv1_2 , CERT_REQUIRED
    from cassandra.auth import PlainTextAuthProvider
    import boto3
    from cassandra_sigv4.auth import SigV4AuthProvider
    
    ssl_context = SSLContext(PROTOCOL_TLSv1_2)
    ssl_context.load_verify_locations('path_to_file/sf-class2-root.crt')
    ssl_context.verify_mode = CERT_REQUIRED
    
    boto_session = boto3.Session()
    auth_provider = SigV4AuthProvider(boto_session)
    
    cluster = Cluster(['cassandra.us-east-2.amazonaws.com'], ssl_context=ssl_context, auth_provider=auth_provider,
                      port=9142)

    cluster.schema_metadata_enabled = False
    cluster.token_metadata_enabled = False 
    
    session = cluster.connect()
    r = session.execute('select * from system_schema.keyspaces')
    print(r.current_rows)

Upvotes: 3

Alex Ott
Alex Ott

Reputation: 87224

It's a recommended way - open connection at start and keep it (and have one connection per application). Opening connection to a Cassandra cluster is an expensive operation, because besides connection itself, driver discovers the topology of the cluster, calculate token ranges, and many other things. Usually, for "normal" Cassandra this shouldn't be very long (but still expensive), and AWS's emulation may add an additional latency on top of it.

Upvotes: 3

Related Questions