Reputation: 130
I have a data frame which looks like this -
+-------------+---------------+-----------------+-------------+-------------+
| Address_Type| Address_Zip| Address_City| Name| ID|
+-------------+---------------+-----------------+-------------+-------------+
| HOME| 141101| Nevada| George| SO-123|
+-------------+---------------+-----------------+-------------+-------------+
| OFFICE| 123561| LA| George| SO-123|
+-------------+---------------+-----------------+-------------+-------------+
| HOME| 141234| New York| Jane| SC-128|
+-------------+---------------+-----------------+-------------+-------------+
| BILL| 111009| UTAH| Jane| SC-128|
+-------------+---------------+-----------------+-------------+-------------+
I'm trying to save the data in cassandra where there is a field named Address which is of type Set. Now I want to save the address which is the combination of all field associated with address tag. So that the new Dataframe looks like -
+-------------+-------------+----------------------------------------------------+
| Name| ID| Address|
+-------------+-------------+----------------------------------------------------+
| George| SO-123|{"Address_Type: "HOME", "Address_City": "Nevada",...|
+-------------+-------------+----------------------------------------------------+
| Jane| SC-128|{"Address_Type: "HOME", "Address_City": "New York",.|
+-------------+-------------+----------------------------------------------------+
and I can easily save it to the cassandra table.
How can I do this?
Upvotes: 1
Views: 637
Reputation: 16576
All that needs to happen is to match up the DataFrame with the Cassandra Table. So if you are inserting into a Cassandra table with type Set. You just need a dataframe whose schema contains a column of that name of type Array where the internal structure of those rows matches the Address
type.
So in your case the dataframe should look like
| Name | ID | Addresses Array<Address> |
Which would match a cassandra table
| Name String, ID String, Addresses Set<Addresses>|
With that matching the command would be
df.write.format("org.apache.spark.sql.cassandra").options(...).save()
Upvotes: 1