Knight71
Knight71

Reputation: 2949

How to filter in rdd containing array of tuples?

I have a data like this Array of Array of tuple(Int,Int,String).

val data = Array(Array((1,200,"vimal"),(2,12,"amar"),(1,120,"vimal"),  
   (2,120,"kamal"),(1,120,"jay")),Array((1,200,"vimal"),(1,120,"vimal"),
   (2,120,"kamal"),(1,120,"jay")))
val dist = sc.parallelize(data)

I would like to filter the tuples containing 2 as the first integer.

The result should look like

(2,12,"amar"),(2,120,"kamal"),(2,120,"kamal")

Upvotes: 2

Views: 4359

Answers (2)

elm
elm

Reputation: 20415

Using a for comprehension like this,

for ( xs <- data; t @ (a,b,c) <- xs if a == 2 ) yield t 

where t is bound to each tuple and we filter those tuples where there first item is 2. Likewise

for ( t @ (a,b,c) <- data.flatten if a == 2 ) yield t

conveys the result; here we flatten out the nested arrays first. Even shorter is this,

for ( t <- data.flatten if t._1 == 2 ) yield t

And as short as it gets (using filter as already proposed),

data.flatten.filter(_._1 == 2)

With collect, consider this pattern matching,

data.flatten.collect { case t @ (2,_,_) => t }

Also we can partition a flattened version of data by the desired criteria (first item in tuple is 2), and get the first element from the partitioning tuple,

data.flatten.partition(_._1 == 2)._1

Upvotes: 2

Carlos Vilchez
Carlos Vilchez

Reputation: 2804

I think you need something like this:

dist.flatMap { arrayElement =>
  arrayElement filter {
    case (x: Int, y: Int, str: String) => x == 2
  }
}

Upvotes: 3

Related Questions