Chenson
Chenson

Reputation: 33

calculate co-occurrence terms with spark using scala

I want to calculate co-occurance terms using scala. But I encounter some problems.

This is my code:

val path = "pg100.txt"
val words = sc.textFile(path).map(_.toLowerCase.split("[\\s*$&#/\"'\\,.:;?!\\[\\](){}<>~\\-_]+").map(_.trim).sorted)
val coTerm = words.map{ line =>
    for{ 
        i <-0 until line.length
        j <- (i+1) until line.length
    } {
        ((line(i), line(j)), 1)
    }}  

The expected output should be:

coTerm.collect
res48: Array[Unit] = Array(((word1, word2), 1), ((word1, word3), 1), ((word2, word3), 1)...

But my output is following:

coTerm.collect
res51: Array[Unit] = Array((), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), ()....

I don't know why I can use println function in .map to print the word pairs but cannot emit the output.

Upvotes: 3

Views: 1273

Answers (1)

Darshan
Darshan

Reputation: 2333

The cause is you are not actually returning any records from you map.

Use yield to return the records in the for as shown below:

val coTerm = words.map{ line =>
for{ 
    i <-0 until line.length
    j <- (i+1) until line.length
} yield {
    ((line(i), line(j)), 1)
}}  

Upvotes: 2

Related Questions