How to convert a Cassandra ResultSet to a Spark DataFrame?

Question

I would normally load data from Cassandra into Apache Spark this way using Java:

SparkContext sparkContext = StorakleSparkConfig.getSparkContext();

CassandraSQLContext sqlContext = new CassandraSQLContext(sparkContext);
    sqlContext.setKeyspace("midatabase");

DataFrame customers = sqlContext.cassandraSql("SELECT email, first_name, last_name FROM store_customer " +
            "WHERE CAST(store_id as string) = '" + storeId + "'");

But imagine I have a sharder and I need to load several partion keys into this DataFrame. I could use WHERE IN (...) in my query and again use the cassandraSql method. But I am a bit reluctant to use WHERE IN due to the infamous problem with having a one-point-of-failure in terms of the coordinator node. This is explained here:

https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/

Is there a way to use several queries but load them into a single DataFrame?

How to convert a Cassandra ResultSet to a Spark DataFrame?

Answers (1)

Related Questions