Reputation: 113
I am new to Scala and HDFS:
I am just wondering I am able to read local file from Scala code but how to read from HDFS:
import scala.io.source
object ReadLine {
def main(args:Array[String]) {
if (args.length>0) {
for (line <- Source.fromLine(args(0)).getLine())
println(line)
}
}
in Argument I have passed hdfs://localhost:9000/usr/local/log_data/file1..
But its giving FileNotFoundException
error
I am definitely missing something.. can anyone help me out here ?
Upvotes: 3
Views: 6380
Reputation: 41957
scala.io.source
api cannot read from HDFS
. Source
is used to read from local file system.
Spark
If you want to read from hdfs
then I would recommend to use spark
where you would have to use sparkContext
.
val lines = sc.textFile(args(0)) //args(0) should be hdfs:///usr/local/log_data/file1
No Spark
If you don't want to use spark
then you should go with BufferedReader
or StreamReader
or hadoop filesystem api
. for example
val hdfs = FileSystem.get(new URI("hdfs://yourUrl:port/"), new Configuration())
val path = new Path("/path/to/file/")
val stream = hdfs.open(path)
def readLines = Stream.cons(stream.readLine, Stream.continually( stream.readLine))
Upvotes: 10