Reputation: 33
I want to calculate co-occurance terms using scala. But I encounter some problems.
This is my code:
val path = "pg100.txt"
val words = sc.textFile(path).map(_.toLowerCase.split("[\\s*$&#/\"'\\,.:;?!\\[\\](){}<>~\\-_]+").map(_.trim).sorted)
val coTerm = words.map{ line =>
for{
i <-0 until line.length
j <- (i+1) until line.length
} {
((line(i), line(j)), 1)
}}
The expected output should be:
coTerm.collect
res48: Array[Unit] = Array(((word1, word2), 1), ((word1, word3), 1), ((word2, word3), 1)...
But my output is following:
coTerm.collect
res51: Array[Unit] = Array((), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), ()....
I don't know why I can use println function in .map to print the word pairs but cannot emit the output.
Upvotes: 3
Views: 1273
Reputation: 2333
The cause is you are not actually returning any records from you map
.
Use yield
to return the records in the for
as shown below:
val coTerm = words.map{ line =>
for{
i <-0 until line.length
j <- (i+1) until line.length
} yield {
((line(i), line(j)), 1)
}}
Upvotes: 2