Reputation: 3374
I am working on spark and not an expert in scala. I have got the two variants of map function. Could you please explain the difference between them.?
first variant and known format.
first variant
val.map( (x,y) => x.size())
Second variant -> This has been applied on tuple
val.map({case (x, y) => y.toString()});
The type of val is RDD[(IntWritable, Text)]
. When i tried with first function, it gave error as below.
type mismatch; found : (org.apache.hadoop.io.IntWritable, org.apache.hadoop.io.Text) ⇒ Unit required: ((org.apache.hadoop.io.IntWritable, org.apache.hadoop.io.Text)) ⇒ Unit
When I added extra parenthesis it said,
Tuples cannot be directly destructured in method or function parameters.
Upvotes: 2
Views: 291
Reputation: 2869
Well you say:
The type of val is RDD[(IntWritable, Text)]
so it is a tuple of arity 2 with IntWritable
and Text
as components.
If you say
val.map( (x,y) => x.size())
what you're doing is you are essentially passing in a Function2
, a function with two arguments to the map
function. This will never compile because map
wants a function with one argument. What you can do is the following:
val.map((xy: (IntWritable, Text)) => xy._2.toString)
using ._2
to get the second part of the tuple which is passed in as xy
(the type annotation is not required but makes it more clear).
Now the second variant (you can leave out the outer parens):
val.map { case (x, y) => y.toString() }
this is special scala syntax for creating a PartialFunction
that immediately matches on the tuple that is passed in to access the x
and y
parts. This is possible because PartialFunction extends from the regular Function1 class (Function1[A,B]
can be written as A => B
) with one argument.
Hope that makes it more clear :)
Upvotes: 3
Reputation: 8996
Your first example is a function that takes two arguments and returns a String. This is similar to this example:
scala> val f = (x:Int,y:Int) => x + y
f: (Int, Int) => Int = <function2>
You can see that the type of f
is (Int,Int) => Int
(just slightly changed this to be returning an int instead of a string). Meaning that this is a function that takes two Int as arguments and returns an Int as a result.
Now the second example you have is a syntactic sugar (a shortcut) for writing something like this:
scala> val g = (k: (Int, Int)) => k match { case (x: Int, y: Int) => x + y }
g: ((Int, Int)) => Int = <function1>
You see that the return type of function g
is now ((Int, Int)) => Int
. Can you spot the difference? The input type of g
has two parentheses. This shows that g
takes one argument and that argument must be a Tuple[Int,Int]
(or (Int,Int)
for short).
Going back to your RDD, what you have is an Collection of Tuple[IntWritable, Text]
so the second function will work, whereas the first one will not work.
Upvotes: 0
Reputation: 1
I try this in repl:
scala> val l = List(("firstname", "tom"), ("secondname", "kate"))
l: List[(String, String)] = List((firstname,tom), (secondname,kate))
scala> l.map((x, y) => x.size)
<console>:9: error: missing parameter type
Note: The expected type requires a one-argument function accepting a 2-Tuple.
Consider a pattern matching anonymous function, `{ case (x, y) => ... }`
l.map((x, y) => x.size)
maybe can give you some inspire.
Upvotes: 0