Shalini Baranwal
Shalini Baranwal

Reputation: 3008

Can we replace map with flatMap?

I was trying to find line with maximum words, and i wrote the following lines, to run on spark-shell:

import java.lang.Math

val counts = textFile.map(line => line.split(" ").size).reduce((a, b) => Math.max(a, b))

But since, map is one to one , and flatMap is one to either zero or anything. So i tried replacing map with flatMap, in above code. But its giving error as:

<console>:24: error: type mismatch;
found   : Int
required: TraversableOnce[?]
   val counts = F1.flatMap(s => s.split(" ").size).reduce((a,b)=> Math.max(a,b))

If anybody could make me understand the reason, it will really be helpful.

Upvotes: 1

Views: 1680

Answers (3)

puppylpg
puppylpg

Reputation: 1220

After reading Tired of Null Pointer Exceptions? Consider Using Java SE 8's Optional!'s part about why use flatMap() rather than Map(), I have realized the truly reason why flatMap() can not replace map() is that map() is not a special case of flatMap().

It's true that flatMap() means one-to-many, but that's not the only thing flatMap() does. It can also strip outer Stream() if put it simply.

See the definations of map and flatMap:

Stream<R> map(Function<? super T, ? extends R> mapper)
Stream<R> flatMap(Function<? super T, ? extends Stream<? extends R>> mapper)

the only difference is the type of returned value in inner function. What map() returned is "Stream<'what inner function returned'>", while what flatMap() returned is just "what inner function returned".

So you can say that flatMap() can kick outer Stream() away, but map() can't. This is the most difference in my opinion, and also why map() is not just a special case of flatMap().

ps:

If you really want to make one-to-one with flatMap, then you should change it into one-to-List(one). That means you should add an outer Stream() manually which will be stripped by flatMap() later. After that you'll get the same effect as using map().(Certainly, it's clumsy. So don't do like that.)

Here are examples for Java8, but the same as Scala:

  1. use map():

     list.stream().map(line -> line.split(" ").length)
    
  2. deprecated use flatMap():

    list.stream().flatMap(line -> Arrays.asList(line.split(" ").length).stream())
    

Upvotes: 0

Vince.Bdn
Vince.Bdn

Reputation: 1175

flatMap must return an Iterable which is clearly not what you want. You do want a map because you want to map a line to the number of words, so you want a one-to-one function that takes a line and maps it to the number of words (though you could create a collection with one element, being the size of course...).

FlatMap is meant to associate a collection to an input, for instance if you wanted to map a line to all its words you would do:

val words = textFile.flatMap(x => x.split(" "))

and that would return an RDD[String] containing all the words.

In the end, map transforms an RDD of size N into another RDD of size N (e.g. your lines to their length) whereas flatMap transforms an RDD of size N into an RDD of size P (actually an RDD of size N into an RDD of size N made of collections, all these collections are then flattened to produce the RDD of size P).

P.S.: one last word that has nothing to do with your problem, it is more efficient to do (for a string s)

val nbWords = s.split(" ").length

than call .size(). Indeed, the split method returns an array of String and arrays do not have a size method. So when you call .size() you have an implicit conversion from Array[String] to SeqLike[String] which creates new objects. But Array[T] do have a length field so there's no conversion calling length. (It's a detail but I think it's good habit though).

Upvotes: 1

Alexey Romanov
Alexey Romanov

Reputation: 170859

Any use of map can be replaced by flatMap, but the function argument has to be changed to return a single-element List: textFile.flatMap(line => List(line.split(" ").size)). This isn't a good idea: it just makes your code less understandable and less efficient.

Upvotes: 0

Related Questions