Sandeep Das
Sandeep Das

Reputation: 1050

Is spark streaming works with both "cp" and "mv"

I am using spark streaming

My program continuously read streams from a hadoop folder .The problem is If I copy to my hadoop folder( hadoop fs -copyFromLocal) the spark job starts but if I do move (hadoop fs -mv /hadoopsourcePath/* /destinationPath/ ) it does not work .

Is it a limitation of spark streaming ?

I have another question related to spark streaming : Can spark streaming pick specific files

Upvotes: 0

Views: 454

Answers (1)

Sandeep Das
Sandeep Das

Reputation: 1050

Got it ..It works in spark 1.5 But it picks only those files whose timestamp equal to current time stamp .

For Example

Temp Folder : file f.txt (timestamp t1: when the file was created)

Spark Input folder : /input

when you do a mv ( hadoop fs -mv /temp/f.txt /input) : Spark will not pick

But after moving if you change the timestamp of the moved file , spark will pick .

https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala

Had to check the source code of spark .

Upvotes: 1

Related Questions