Reputation: 107
I am trying to read a file that look like this:
you 0.0432052044116
i 0.0391075831328
the 0.0328010698268
to 0.0237549924919
a 0.0209682886489
it 0.0198104294359
And I'd like to store it in a RDD (key,value) with (you,0.0432) for example. For the moment I only did that algorithm
val filename = "freq2.txt"
try {
for (line <- Source.fromFile(filename).getLines()) {
val tuple = line.split(" ")
val key = tuple(0)
val words = tuple(1)
println(s"${key}")
println(s"${words}")
}
} catch {
case ex: FileNotFoundException => println("Couldn't find that file.")
case ex: IOException => println("Had an IOException trying to read that file")
}
But I don't know how to store the data...
Upvotes: 1
Views: 7778
Reputation: 9035
You can directly read the data into an RDD:
val FIELD_SEP = " " //or whatever you have
val dataset = sparkContext.textFile(sourceFile).map(line => {
val word::score::other = line.split(FIELD_SEP).toList
(word, score)
})
Upvotes: 6
Reputation: 11
val filename = "freq2.txt"
sc.textFile(filename).split("\\r?\\n").map(x =>{
var data = x.trim().split(" ")
(data(0), data(1))
}).map(y => println(y));
Upvotes: 1