Reputation: 3374

Difference between these two function formats

I am working on spark and not an expert in scala. I have got the two variants of map function. Could you please explain the difference between them.?

first variant and known format.

first variant

val.map( (x,y) => x.size())

Second variant -> This has been applied on tuple

val.map({case (x, y) => y.toString()});

The type of val is RDD[(IntWritable, Text)]. When i tried with first function, it gave error as below.

type mismatch; found : (org.apache.hadoop.io.IntWritable, org.apache.hadoop.io.Text) ⇒ Unit required: ((org.apache.hadoop.io.IntWritable, org.apache.hadoop.io.Text)) ⇒ Unit

When I added extra parenthesis it said,

Tuples cannot be directly destructured in method or function parameters.

Upvotes: 2

Answers (3)

Markus1189

Reputation: 2869

Well you say:

The type of val is RDD[(IntWritable, Text)]

so it is a tuple of arity 2 with IntWritable and Text as components.

If you say

val.map( (x,y) => x.size())

what you're doing is you are essentially passing in a Function2, a function with two arguments to the map function. This will never compile because map wants a function with one argument. What you can do is the following:

val.map((xy: (IntWritable, Text)) => xy._2.toString)

using ._2 to get the second part of the tuple which is passed in as xy (the type annotation is not required but makes it more clear).

Now the second variant (you can leave out the outer parens):

val.map { case (x, y) => y.toString() }

this is special scala syntax for creating a PartialFunction that immediately matches on the tuple that is passed in to access the x and y parts. This is possible because PartialFunction extends from the regular Function1 class (Function1[A,B] can be written as A => B) with one argument.

Hope that makes it more clear :)

Upvotes: 3

marios

Reputation: 8996

Your first example is a function that takes two arguments and returns a String. This is similar to this example:

scala> val f = (x:Int,y:Int) => x + y
f: (Int, Int) => Int = <function2>

You can see that the type of f is (Int,Int) => Int (just slightly changed this to be returning an int instead of a string). Meaning that this is a function that takes two Int as arguments and returns an Int as a result.

Now the second example you have is a syntactic sugar (a shortcut) for writing something like this:

scala> val g = (k: (Int, Int)) => k match { case (x: Int, y: Int) => x + y }
g: ((Int, Int)) => Int = <function1>

You see that the return type of function g is now ((Int, Int)) => Int. Can you spot the difference? The input type of g has two parentheses. This shows that g takes one argument and that argument must be a Tuple[Int,Int] (or (Int,Int) for short).

Going back to your RDD, what you have is an Collection of Tuple[IntWritable, Text] so the second function will work, whereas the first one will not work.

Upvotes: 0

Alfonso Men

Reputation: 1

I try this in repl:

scala> val l = List(("firstname", "tom"), ("secondname", "kate"))
l: List[(String, String)] = List((firstname,tom), (secondname,kate))

scala> l.map((x, y) => x.size)
<console>:9: error: missing parameter type
Note: The expected type requires a one-argument function accepting a    2-Tuple.
  Consider a pattern matching anonymous function, `{ case (x, y) =>  ... }`
          l.map((x, y) => x.size)

maybe can give you some inspire.

Upvotes: 0

Difference between these two function formats

Answers (3)

Related Questions