Learner
Learner

Reputation: 33

how to select and count the each individual words from file?

hello how are you
I am fine
how are you
I am also fine
Thank you

This is the file I have, I want to count how many times each word is repeated on the file? So the output should look like

(hello,1)
(how,2)
(are,2)
(you,3)

and so on.

I tried this

val rdd = sc.textFile("/path")
val rdd1= rdd.map(x=>(x.distinct,x.length)).collect

but it didn't work? Please help.

Upvotes: 1

Views: 126

Answers (1)

mck
mck

Reputation: 42392

You can use countByValue():

rdd.map(x => x.split(" ")).flatMap(x => x).countByValue()

which returns a map:

Map(are -> 2, am -> 2, I -> 2, you -> 3, also -> 1, how -> 2, Thank -> 1, fine -> 2, hello -> 1)

If you want an RDD you can do

val rdd1 = sc.parallelize(rdd.map(x => x.split(" ")).flatMap(x => x).countByValue().toSeq)

Upvotes: 1

Related Questions