How to execute CQL query using pyspark

I want to execute Cassandra CQL query using PySpark.But I am not finding the way to execute it.I can load whole table to dataframe and create Tempview and query it.

df = spark.read.format("org.apache.spark.sql.cassandra").
        options(table="country_production2",keyspace="country").load()
df.createOrReplaceTempView("Test")

Please suggest any better way to so that I can execute CQL query in PySpark.

Upvotes: 0

Answers (2)

Alex Ott

Reputation: 87164

In pyspark you're using SQL, not CQL. If the SQL query somehow matches the CQL, i.e., you're querying by partition or primary key, then Spark Cassandra Connector (SCC) will transform query into that CQL, and execute (so-called predicates pushdown). If it doesn't match, then Spark will load all data via SCC, and perform filtering on the Spark level.

So after you're registered temporary view, you can do:

val result = spark.sql("select ... from Test where ...")

and work with results in result variable. To check if predicates pushdown happens, execute result.explain(), and check for the * marker in the conditions in the PushedFilters section.

Upvotes: 0

suresiva

Reputation: 3173

Spark SQL doesn't support Cassandra's cql dialects directly. It only allows you to load the table as a Dataframe and operate on it.

If you are concerned about reading a whole table to query it, then you may use the filters as given below to let Spark push the predicates the load only the data you need.

from pyspark.sql.functions import *

df = spark.read\
          .format("org.apache.spark.sql.cassandra")\
          .options(table=table_name, keyspace=keys_space_name)\
          .load()\
          .filter(col("id")=="A")

df.createOrReplaceTempView("Test")

Upvotes: 1

How to execute CQL query using pyspark

Answers (2)

Related Questions