Roy Wood
Roy Wood

Reputation: 79

How to combine Lists of an Objects inside another List of Objects in Scala

I've tried working this out and think "flatten" might be part of my solution but I just can't work it out.

Imagine:

case class Thing (value1: Int, value2: Int)
case class Container (string1: String, listOfThings: List[Thing], string2: String)

So my list:

List[Container]

could be any size but for now we'll just have 3.

Inside each Container there is a list

listofthings[Thing]

that could also have number of type Thing in it, for now we'll also just have 3.

So what I want to get is something like

fullListOfThings[Thing] = List(Thing(1,1), Thing(1,2), Thing(1,3),
    Thing(2,1), Thing(2,2), Thing(2,3), Thing(3,1), Thing(3,2), Thing(3,3))

The first value in Thing being it's Container number and the second value being the Thing number in that Container.

I hope all this makes sense.

To make it more complicated for me, my list of Container is not actually a list but rather an RDD,

RDD rddOfContainers[Container]

and what I need at the end is an RDD of Things

fullRddOfThings[Thing]

In the Java that I am more used to this would be pretty straight forward but Scala is different. I'm pretty new to Scala and am having to learn this on the fly so any full explanation would be very welcome.

I want to avoid bringing in too much external libraries if I can. In the mean time I'll keep reading. Thanks

Upvotes: 0

Views: 1867

Answers (2)

Lukas Eichler
Lukas Eichler

Reputation: 5903

var list = rddOfContainers.flatMap(x => x.listOfThings).flatMap(y => y)
var rddOfThings = sc.parallelize(list)

Upvotes: 0

Odomontois
Odomontois

Reputation: 16308

Having RDD as well any other proper scala collection, you could use flatMap for such operations

val containers = sc.parallelize(Seq(
  Container("",List(Thing(1,2), Thing(2,3)),""), 
  Container("", Nil,""), 
  Container("",List(Thing(3,4)),"")))
//containers: org.apache.spark.rdd.RDD[Container]
val things = containers flatMap (_.listOfThings)
//things: org.apache.spark.rdd.RDD[Thing]
things.collect()
//res2: Array[Thing] = Array(Thing(1,2), Thing(2,3), Thing(3,4))

Upvotes: 2

Related Questions