Alan Miranda
Alan Miranda

Reputation: 183

Problem to write on keyspace with new versions spark 3.x

I'm trying to write on aws keyspace, but the following message appears:

enter image description here

Spark version: 3.0.1
Connector: 3.0
Java: 1.8
Scala: 2.12

Respecting by the version on github: enter image description here

In other previus version like Connector = 2.5.2 and spark = 2.4.6 works fine.

Upvotes: 0

Views: 187

Answers (1)

MikeJPR
MikeJPR

Reputation: 812

You should be able to connect using spark 3 and connector 3. Here are some steps to validate you setup connection accordingly and you have the right permissions.

  • Make sure you have permissions to read the system tables.
  • If you have setup the VPCE endpoint ensure you have permissions for describe VPC endpoints.
  • In you configuration make sure that host-validation set to false in ssl config.

You should be able to execute the following query against your system.peers table and retrieve the ips from the endpoint public/private. If you have 1 or no peers you need to take the steps above. Remember the AWS console is not in your vpc and will contact the public endpoint similar to s3.

SELECT * FROM system.peers

Sample Policy. You need to provide access to resource /keyspace/system* and ec2:DescribeNetworkInterfaces" and "ec2:DescribeVpcEndpoints" on your vpc.

    {
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect":"Allow",
         "Action":[
            "cassandra:Select",
            "cassandra:Modify"
         ],
         "Resource":[
            "arn:aws:cassandra:us-east-1:111122223333:/keyspace/mykeyspace/table/mytable",
            "arn:aws:cassandra:us-east-1:111122223333:/keyspace/system*"
         ]
      },
      {
         "Sid":"ListVPCEndpoints",
         "Effect":"Allow",
         "Action":[
            "ec2:DescribeNetworkInterfaces",
            "ec2:DescribeVpcEndpoints"
         ],
         "Resource":"*"
      }
   ]
}

Setup the connection by referencing the external config.

-conf":"spark.cassandra.connection.config.profile.path=application.conf"

Sample driver config.

datastax-java-driver {
  basic.request.consistency = "LOCAL_QUORUM"
  basic.contact-points = [ "cassandra.us-east-1.amazonaws.com:9142"]

  advanced.reconnect-on-init = true

   basic.load-balancing-policy {
        local-datacenter = "us-east-1"
     }

   advanced.auth-provider = {
       class = PlainTextAuthProvider
       username = "user-at-sample"
       password = "S@MPLE=PASSWORD="
    }

    advanced.throttler = {
       class = ConcurrencyLimitingRequestThrottler
       max-concurrent-requests = 30
       max-queue-size = 2000
    }



   advanced.ssl-engine-factory {
      class = DefaultSslEngineFactory
      hostname-validation = false
    }

    advanced.connection.pool.local.size = 1


}

Upvotes: 2

Related Questions