Spark Streaming reading from local file gives NullPointerException

Question

Using Spark 2.2.0 on OS X High Sierra. I'm running a Spark Streaming application to read a local file:

val lines = ssc.textFileStream("file:///Users/userName/Documents/Notes/MoreNotes/sampleFile")
    lines.print()

This gives me

org.apache.spark.streaming.dstream.FileInputDStream logWarning - Error finding new files
java.lang.NullPointerException
    at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:192)

The file exists, and I am able to read it using SparkContext (sc) from spark-shell on the terminal. For some reason going through the Intellij application and Spark Streaming is not working. Any ideas appreciated!

ernest_k · Accepted Answer

Quoting the doc comments of textFileStream:

Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat). Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.

@param directory HDFS directory to monitor for new file

So, the method expects the path to a directory in the parameter.

So I believe this should avoid that error:

ssc.textFileStream("file:///Users/userName/Documents/Notes/MoreNotes/")

Spark Streaming reading from local file gives NullPointerException

Answers (2)

Related Questions