Peace_Man
Peace_Man

Reputation: 45

Reverse the Key and value of a Pair in Spark

I would like to reverse the Key and value Pair of Customer ID and Number of Visits:

scala> val pair = input.map(line => line.split(" ") (2)).map(input => (input, 1)).reduceByKey(_+_).foreach(println)

Result:

(48784,3)
(47847,10)
(87673,8)
(67654,4)

I would like to reverse this pair to look something like this:

(3,48784)
(10,47847)
(8,87673)
(4,67654)

I've research similar answers to this question on this site and have tried the following.

  1. input.map{ pair => pair.swap}

  2. val PairReverse = pair.map(X => (x._2,x._1))

  3. val PairReverse = pair.map(X => ((1),(0))

I keep getting the following error:

"Map value is not a member of this Unit"

Upvotes: 1

Views: 4437

Answers (4)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41987

just remove .foreach(println) from the following line and you should be fine

val pair = input.map(line => line.split(" ") (2)).map(input => (input, 1)).reduceByKey(_+_).foreach(println)

as foreach is an action and returns unit() . foreach definition looks as

def foreach(f: T => Unit): Unit

After the removal of foreach(), the followings should work for you

val pair = input.map(line => line.split(" ") (2)).map(input => (input, 1)).reduceByKey(_+_)
val pairReverse = input.map(pair => pair.swap)
pairReverse.foreach(println)

Or

val pair = input.map(line => line.split(" ") (2)).map(input => (input, 1)).reduceByKey(_+_)
val pairReverse = pair.map(x => (x._2,x._1))
pairReverse.foreach(println)

But the last map that you tried will not work i.e.

val pair = input.map(line => line.split(" ") (2)).map(input => (input, 1)).reduceByKey(_+_)
val pairReverse = pair.map(X => ((1),(0)))
pairReverse.foreach(println)

as it will produce (1,0) tuples for each pairs you have as

(1,0)
(1,0)
(1,0)
(1,0)

I hope the answer is helpful

Upvotes: 3

Puneeth Reddy V
Puneeth Reddy V

Reputation: 1578

Here is way to swap key, value pair of each element using collect

 val map = Map(48784->3, 47847 -> 10, 87673 -> 8, 67654 -> 4)
 map: scala.collection.immutable.Map[Int,Int] = Map(48784 -> 3, 47847 -> 10, 87673 -> 8, 67654 -> 4)

scala> map.collect{
    case e => e._2 -> e._1
}
res0: scala.collection.immutable.Map[Int,Int] = Map(3 -> 48784, 10 -> 47847, 8 -> 87673, 4 -> 67654)

Upvotes: 1

koiralo
koiralo

Reputation: 23119

val pair = input.map(line => line.split(" ") (2)).map(input => (input, 1)).reduceByKey(_+_).foreach(println)

This returns Unit, the foreach(println) makes the pair to be unit.

val pair = input.map(line => line.split(" ") (2)).map(input => (input, 1)).reduceByKey(_+_)
// now print the pair rdd
pair.foreach(println)

//Now you can use it to swap using one of the following
pair.map(_.swap)
pair.map(x => (x._2,x._1))
pair.map(X => ((1),(0)))

So just remove the foreach(println) and should work as expected

Upvotes: 1

Alex Savitsky
Alex Savitsky

Reputation: 2371

pair is of type Unit because of using foreach, as it's not a functional operation (its result is a Unit). If you add .map(X => (x._2,x._1)) just before foreach invocation, like this:

val pair = input.map(line => line.split(" ") (2)).map(input => (input, 1)).
    reduceByKey(_+_).map(X => (x._2,x._1)).foreach(println)

it should work.

Upvotes: 2

Related Questions