Maksym
Maksym

Reputation: 4584

Apache Spark read file as a stream from HDFS

How can I read file as a stream from hdfs using Apache Spark Java? I don't want to read whole file, I want to have file stream in order to stop reading file when some condition is met, how can I do it with Apache Spark?

Upvotes: 7

Views: 3474

Answers (1)

Hutashan Chandrakar
Hutashan Chandrakar

Reputation: 425

You can use streaming HDFS file using ssc method

val ssc = new StreamingContext(sparkConf, Seconds(batchTime))

val dStream = ssc.fileStream[LongWritable, Text, TextInputFormat]( streamDirectory, (x: Path) => true, newFilesOnly = false)

Using above api param filter Function to filter paths to process.

If your condition is not with file path/name and based on data, then you need to stop streaming context if condition satisfy.

For this you need to use thread implementation, 1) In one thread you need to keep checking for streaming context is stopped and if ssc stopped then notify other thread to wait and create new streaming context.

2) In second thread , you need to check for condition and if condition satisfy then stop streaming context.

Please let me know if you need to explanation.

Upvotes: 1

Related Questions