Reputation: 1004
i am trying to read file contents from my hdfs for that i am using Source.fromFile(). It is working fine when my file is in local system but throwing error when i am trying to read file from HDFS.
object CheckFile{
def main(args:Array[String]) {
for (line <- Source.fromFile("/user/cloudera/xxxx/File").getLines()) {
println(line)
}
}
}
Error:
java.io.FileNotFoundException: hdfs:/quickstart.cloudera:8080/user/cloudera/xxxx/File (No such file or directory)
i searched but i am not able to find any solutions to this.
Please help
Upvotes: 3
Views: 4092
Reputation: 932
sc.textFile("hdfs://path/to/file.txt").toLocalIterator.toArray.mkString
will give the result as string
Upvotes: 1
Reputation: 853
If you are using Spark you should use SparkContext
to load the files. Source.fromFile
uses the local file system.
Say you have your SparkContext
at sc
,
val fromFile = sc.textFile("hdfs://path/to/file.txt")
Should do the trick. You might have to specify the node address, though.
UPDATE:
To add to the comment. You want to read some data from hdfs and store it as a Scala collection. This is bad practice as the file might contain milions of lines and it will crash due to insufficient amount of memory; you should use RDDs and not built-in Scala collections. Nevertheless, if this is what you want, you could do:
val fromFile = sc.textFile("hdfs://path/to/file.txt").toLocalIterator.toArray
Which would produce a local collection of desired type (Array
in this case).
Upvotes: 4