Satyabrat Kumar
Satyabrat Kumar

Reputation: 164

Apache Spark Streaming from folder (not HDFS)

I was wondering if there is any reliable way for creating spark streams from a physical location? I was using 'textFileStream' but seems it is mainly used if the files are in HDFS. If you see the definition of the function it says "Create an input stream that monitors a Hadoop-compatible filesystem"

Upvotes: 0

Views: 887

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191748

Are you implying that HDFS is not a physical location? There are datanode directories that physically exist...

You should be able to use textFile with the file:// URI, but you need to ensure all nodes in the cluster can read from that location.

From the definition of Hadoop compatible filesystem.

The selection of which filesystem to use comes from the URI scheme used to refer to it -the prefix hdfs: on any file path means that it refers to an HDFS filesystem; file: to the local filesystem, s3: to Amazon S3, ftp: FTP, swift: OpenStackSwift, ...etc.

There are other filesystems that provide explicit integration with Hadoop through the relevant Java JAR files, native binaries and configuration parameters needed to add a new schema to Hadoop

Upvotes: 2

Related Questions