Reputation: 281
I am new to Scala and am currently studying datasets for Scala and Spark. Based on my input dataset below, I am trying to create a new dataset (see below). In the new dataset, I aim to have a new column which contains a Scala trait Seq[order_summary]
. The Scala trait stores data the corresponding Name
, Ticket Number
, and Seat Number
taken from the input dataset.
I have implemented input_dataset.groupyBy("Name")
to organise the dataset and have tried df.withColumn("NewColumn", struct(df("a"), df("b")))
to combine different columns together. However, I would like to use a Scala trait instead and am also stuck with matching the name to the ticket number. Would anyone know how to resolve this or point me towards the right direction?
Input dataset: input_dataset
Name Type is String. Ticket Number Type is Int
+----+---------------+-------------+
|Name| Ticket Number | Seat Number |
+----+---------------+-------------+
|Adam| 123 | AB |
|Adam| 456 | AC |
|Adam| 789 | AD |
|Bob | 1234 | BA |
|Bob | 5678 | BB |
|Sam | 987 | CA |
|Sam | 654 | CB |
|Sam | 321 | CC |
|Sam | 876 | CD |
+----+---------------+-------------+
Output dataset
Name
Type is String
. Purchase Order Summary
is a trait
, Seq[order_summary]
+----+-----------------------------------------------------+
|Name| Purchase Order Summary |
+----+-----------------------------------------------------+
|Adam|((Adam,123,AB),(Adam,456,AC),(Adam,789,AD)) |
|Bob |((Bob,1234,BA),(Bob,5678,BB)) |
|Sam |((Sam,987,CA),(Sam,654,CB),(Sam,321,CC),(Sam,876,CD))|
+----+-----------------------------------------------------+
Upvotes: 0
Views: 152
Reputation: 446
Pretty sure Spark has a map method.
So you could just create a case class
case class PurchaseOrderSummary(name: String, ticketNum: Long, seatNum: Int)
and instantiate it inside a map from your DF, then collect it into a list.
df.map(row => PurchaseOrderSummary(row.getString(0), row.getLong(1), row.getInt(2))).collectAsList
collectAsList should retrieve data from the RDD and transform it to a scala List[PurchaseOrderSummary].
Upvotes: 0