user2731629
user2731629

Reputation: 412

Read a file from HDFS and assign the contents to string

In Scala, How to read a file in HDFS and assign the contents to a variable. I know how to read a file and I am able to print it. But If I try assign the content to a string, It giving output as Unit(). Below is the codes I tried.

 val dfs = org.apache.hadoop.fs.FileSystem.get(config);
 val snapshot_file = "/path/to/file/test.txt"
val stream = dfs.open(new Path(snapshot_file))
 def readLines = Stream.cons(stream.readLine, Stream.continually( stream.readLine))
readLines.takeWhile(_ != null).foreach(line => println(line))

The above code printing the output properly. But If I tried assign the output to a string, I am getting correct output.

val snapshot_id = readLines.takeWhile(_ != null).foreach(line => println(line))
snapshot_id: Unit = ()

what is the correct way to assign the contents to a variable ?

Upvotes: 4

Views: 5916

Answers (2)

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29155

I used org.apache.commons.io.IOUtils.toString to convert stream in to string

def getfileAsString( file: String): String = {
      import org.apache.hadoop.fs.FileSystem
      val config: Configuration = new Configuration();
      config.set("fs.hdfs.impl", classOf[DistributedFileSystem].getName)
      config.set("fs.file.impl", classOf[LocalFileSystem].getName)
      val dfs = FileSystem.get(config)
      val filePath: FSDataInputStream = dfs.open(new Path(file))
      logInfo("file.available " + filePath.available)
      val outputxmlAsString: String = org.apache.commons.io.IOUtils.toString(filePath, "UTF-8")
      outputxmlAsString
    }

Upvotes: 1

philantrovert
philantrovert

Reputation: 10082

You need to use mkString. Since println returns Unit() which gets stored to your variable if you call println on you stream

val hdfs = org.apache.hadoop.fs.FileSystem.get(new java.net.URI("hdfs://namenode:port/"), new org.apache.hadoop.conf.Configuration()) 
val path = new org.apache.hadoop.fs.Path("/user/cloudera/file.txt")
val stream = hdfs.open(path)
def readLines = scala.io.Source.fromInputStream(stream)
val snapshot_id : String = readLines.takeWhile(_ != null).mkString("\n")

Upvotes: 6

Related Questions