MapReddy Usthili
MapReddy Usthili

Reputation: 288

scala Spark get the top words in each row of Array

I'm unable to get the top word in Array of int and Strings .

See the below Array and required output: Consider n is an RDD and suggest me the Required functions for the getting output .

scala> n.take(10)
res3: Array[(Int, String)] = Array((4,Hi how are you ,how), (2,hello good good to hear good))

O/P : Array((4,how),(2,good)) // how is the Top word in Ist row .. good is the top word in second row .

we can use the following code to get the only topest words but i wanted in each row i wants to get the top word .

val msg = n.map{case(val1, val2) => (val2).mkString("")}
val words =msg.flatMap(x => x.split(" "))
val result = words.map(x => (x, 1)).reduceByKey((x, y) => x + y)
val sortReuslts=result.sortBy(x => (-x._2,x._1))

Thanks :)

Upvotes: 0

Views: 3122

Answers (1)

Shyamendra Solanki
Shyamendra Solanki

Reputation: 8851

Let's first create a function to find maximum frequency word in given text:

def findMaxFrequencyWord(text: String): (String, Int) = {
    text.split("\\W+")
        .map(x => (x, 1))
        .groupBy(y => y._1)
        .map{ case (x,y) => x -> y.length }
        .toArray
        .sortBy(x => -x._2)
        .head 
}

findMaxFrequenceWord("hi, how are you, how")
> (how, 2)

Create rdd of (Int, String):

val arr = Array((4, "how how ok"), (3, "i see, you see"), (5, "fine, it is fine"))

val n = sc.parallelize(arr)  

Find max frequent word in each String in rdd:

val result = n.map{ case (x, y)  => x -> findMaxFrequencyWord(y)._1 }

result.take(3)

> Array[(String, Int)] = Array((4, how), (3, see), (5, fine))

Upvotes: 3

Related Questions