How to create Scala trait which stores data from other columns in dataset and then create new dataset with column storing the trait in Scala?

Question

I am new to Scala and am currently studying datasets for Scala and Spark. Based on my input dataset below, I am trying to create a new dataset (see below). In the new dataset, I aim to have a new column which contains a Scala trait Seq[order_summary]. The Scala trait stores data the corresponding Name, Ticket Number, and Seat Number taken from the input dataset.

I have implemented input_dataset.groupyBy("Name") to organise the dataset and have tried df.withColumn("NewColumn", struct(df("a"), df("b"))) to combine different columns together. However, I would like to use a Scala trait instead and am also stuck with matching the name to the ticket number. Would anyone know how to resolve this or point me towards the right direction?

Input dataset: input_dataset

Name Type is String. Ticket Number Type is Int

+----+---------------+-------------+
|Name| Ticket Number | Seat Number |
+----+---------------+-------------+
|Adam|      123      |     AB      |
|Adam|      456      |     AC      |
|Adam|      789      |     AD      |
|Bob |     1234      |     BA      |
|Bob |     5678      |     BB      |
|Sam |      987      |     CA      |
|Sam |      654      |     CB      |
|Sam |      321      |     CC      |
|Sam |      876      |     CD      |
+----+---------------+-------------+

Output dataset

Name Type is String. Purchase Order Summary is a trait, Seq[order_summary]

+----+-----------------------------------------------------+
|Name| Purchase Order Summary                              |
+----+-----------------------------------------------------+
|Adam|((Adam,123,AB),(Adam,456,AC),(Adam,789,AD))          | 
|Bob |((Bob,1234,BA),(Bob,5678,BB))                        |
|Sam |((Sam,987,CA),(Sam,654,CB),(Sam,321,CC),(Sam,876,CD))|
+----+-----------------------------------------------------+

How to create Scala trait which stores data from other columns in dataset and then create new dataset with column storing the trait in Scala?

Answers (1)

Related Questions