Reputation: 542
I have a list of objects, one of which is another list (or actually, Seq[Row] - these are RDDs) and I want to merge them together. There is a list with rows <a, b, c, d>
where d is itself another list <q, r, s, t>
, one of which is another nested list but let's ignore that for simplicity. I want to change this into a list of <a, b, c, q1, r1, s1, t1>, <a, b, c, q2, r2, s2, t2> ...
I can extract the information into case classes etc. and then put them together, but I feel there should be a way to use zip
and map
etc. to write this in a better functional manner, how should I do so?
Edit Detailed description:
The lists are from nested RDD table on hdfs.
parent: <Long, String, String, Long, String, Float, Seq[Row] foolist >
foolist: <String, String, Long, Int, Seq[Row] barlist >
barlist: <String, Boolean, Int, Long, Seq[Row] list1, Seq[Row] list2 >
They have more fields than stated. Other than the parent object I don't need to filter out any fields in the final result, in which a single row in the parent would become a collection of the values in
{parent row}, {foo row 1}, {barlist row 1}
{parent row}, {foo row 1}, {barlist row 2}
{parent row}, {foo row 1}, {barlist row N}
{parent row}, {foo row 2}, {barlist row 1}
{parent row}, {foo row 2}, {barlist row N}
...
{parent row}, {foo row M}, {barlist row N}
which are not tuples, just a plain list of fields (Long, String, String, Long, String, Float, String, String, Long, Int, String, Boolean, Int, Long ..)
Upvotes: 0
Views: 1174
Reputation: 850
You can use this:
def flat(seq: Seq[Any]):Seq[Any] = seq flatMap {
case sq:Seq[_] => flat(sq)
case x => Seq(x)
}
Edit: if you want to flatten row, you can try:
def flat(seq: Seq[Any]):Seq[Any] = seq flatMap {
case Seq(row) => flat(row.toSeq)
case x => Seq(x)
}
Upvotes: 2
Reputation: 8866
You can use flatMap for that:
seq.flatMap {case (a,b,c,d) => d.map {case (q,r,s,t) => (a,b,c,q,r,s,t)}}
Or:
val res = for {
(a,b,c,d) <- seq;
(q,r,s,t) <- d
} yield (a,b,c,q,r,s,t)
Upvotes: 2