Reputation: 2949
I have a data like this Array of Array of tuple(Int,Int,String).
val data = Array(Array((1,200,"vimal"),(2,12,"amar"),(1,120,"vimal"),
(2,120,"kamal"),(1,120,"jay")),Array((1,200,"vimal"),(1,120,"vimal"),
(2,120,"kamal"),(1,120,"jay")))
val dist = sc.parallelize(data)
I would like to filter the tuples containing 2 as the first integer.
The result should look like
(2,12,"amar"),(2,120,"kamal"),(2,120,"kamal")
Upvotes: 2
Views: 4359
Reputation: 20415
Using a for comprehension like this,
for ( xs <- data; t @ (a,b,c) <- xs if a == 2 ) yield t
where t
is bound to each tuple and we filter those tuples where there first item is 2. Likewise
for ( t @ (a,b,c) <- data.flatten if a == 2 ) yield t
conveys the result; here we flatten out the nested arrays first. Even shorter is this,
for ( t <- data.flatten if t._1 == 2 ) yield t
And as short as it gets (using filter
as already proposed),
data.flatten.filter(_._1 == 2)
With collect
, consider this pattern matching,
data.flatten.collect { case t @ (2,_,_) => t }
Also we can partition a flattened version of data
by the desired criteria
(first item in tuple is 2), and get the first element from the partitioning tuple,
data.flatten.partition(_._1 == 2)._1
Upvotes: 2
Reputation: 2804
I think you need something like this:
dist.flatMap { arrayElement =>
arrayElement filter {
case (x: Int, y: Int, str: String) => x == 2
}
}
Upvotes: 3