Nephilim
Nephilim

Reputation: 130

How to insert set type into cassandra from a dataframe in spark

I have a data frame which looks like this -

+-------------+---------------+-----------------+-------------+-------------+
| Address_Type|    Address_Zip|     Address_City|         Name|           ID|
+-------------+---------------+-----------------+-------------+-------------+
|         HOME|         141101|           Nevada|       George|       SO-123|
+-------------+---------------+-----------------+-------------+-------------+
|       OFFICE|         123561|               LA|       George|       SO-123|
+-------------+---------------+-----------------+-------------+-------------+
|         HOME|         141234|         New York|         Jane|       SC-128|
+-------------+---------------+-----------------+-------------+-------------+
|         BILL|         111009|             UTAH|         Jane|       SC-128|
+-------------+---------------+-----------------+-------------+-------------+

I'm trying to save the data in cassandra where there is a field named Address which is of type Set. Now I want to save the address which is the combination of all field associated with address tag. So that the new Dataframe looks like -

+-------------+-------------+----------------------------------------------------+
|         Name|           ID|                                             Address|
+-------------+-------------+----------------------------------------------------+
|       George|       SO-123|{"Address_Type: "HOME", "Address_City": "Nevada",...|
+-------------+-------------+----------------------------------------------------+
|         Jane|       SC-128|{"Address_Type: "HOME", "Address_City": "New York",.|
+-------------+-------------+----------------------------------------------------+

and I can easily save it to the cassandra table.

How can I do this?

Upvotes: 1

Views: 637

Answers (1)

RussS
RussS

Reputation: 16576

All that needs to happen is to match up the DataFrame with the Cassandra Table. So if you are inserting into a Cassandra table with type Set. You just need a dataframe whose schema contains a column of that name of type Array where the internal structure of those rows matches the Address type.

So in your case the dataframe should look like | Name | ID | Addresses Array<Address> | Which would match a cassandra table | Name String, ID String, Addresses Set<Addresses>|

With that matching the command would be df.write.format("org.apache.spark.sql.cassandra").options(...).save()

Upvotes: 1

Related Questions