Ivan Stoyanov
Ivan Stoyanov

Reputation: 5482

How to save a DataFrame from Spark to Cassandra table by changing the schema and adding additional properties

I have used Spark SQL to retrieve data from a Cassandra database:

DataFrame customers = sqlContext.cassandraSql("SELECT email, first_name, last_name FROM customer " +
                "WHERE CAST(store_id as string) = '" + storeId + "'");

After that I did some filtration and I want to save this data into another Cassandra table that looks like this:

store_id uuid,
report_name text,
report_time timestamp,
sharder int,
customer_email text,
count int static,
firts_name text,
last_name text,
PRIMARY KEY ((store_id, report_name, report_time, sharder), customer_email)

How can I add these additional properties when I save the DataFrame into the new table? Also what is the best practice to shard the Cassandra long row using this example? I expect to have 4k-6k records in the DataFrame, so sharding the long row is a must, but I am not sure if counting the records and then changing the sharder for a certain number of items is the best practice in Spark or Cassandra.

Upvotes: 2

Views: 3532

Answers (2)

AlexL
AlexL

Reputation: 761

after you have the DataFrame, you can define a case class, which has the structure of the new schema with the added properties.

You can create the case class like this: case class DataFrameRecord(property1: String, property2: Long, property3: String, property4: Double)

Then you can use map to convert into the new structure using the case class: df.rdd.map(p => DataFrameRecord(prop1, prop2, prop3, prop4)).toDF()

Upvotes: 3

Greg
Greg

Reputation: 589

You will need to do some sort of transformation (like map()) to add the properties to the data frame.

Upvotes: 0

Related Questions