animal
animal

Reputation: 1004

Source.fromFile not working for HDFS file path

i am trying to read file contents from my hdfs for that i am using Source.fromFile(). It is working fine when my file is in local system but throwing error when i am trying to read file from HDFS.

object CheckFile{
    def main(args:Array[String]) {
        for (line <- Source.fromFile("/user/cloudera/xxxx/File").getLines()) {
            println(line)
        }
    }
}

Error:

java.io.FileNotFoundException: hdfs:/quickstart.cloudera:8080/user/cloudera/xxxx/File (No such file or directory)

i searched but i am not able to find any solutions to this.

Please help

Upvotes: 3

Views: 4092

Answers (2)

dileepVikram
dileepVikram

Reputation: 932

sc.textFile("hdfs://path/to/file.txt").toLocalIterator.toArray.mkString will give the result as string

Upvotes: 1

sebszyller
sebszyller

Reputation: 853

If you are using Spark you should use SparkContext to load the files. Source.fromFile uses the local file system.

Say you have your SparkContext at sc,

val fromFile = sc.textFile("hdfs://path/to/file.txt")

Should do the trick. You might have to specify the node address, though.

UPDATE:

To add to the comment. You want to read some data from hdfs and store it as a Scala collection. This is bad practice as the file might contain milions of lines and it will crash due to insufficient amount of memory; you should use RDDs and not built-in Scala collections. Nevertheless, if this is what you want, you could do:

val fromFile = sc.textFile("hdfs://path/to/file.txt").toLocalIterator.toArray

Which would produce a local collection of desired type (Array in this case).

Upvotes: 4

Related Questions