Vortex
Vortex

Reputation: 789

Spark RDD apend

In Spark, I loaded a data set as RDD and like to infrequently append streaming data to it. I know RDDs are immutable because it simplifies locking, etc. Are the other approaches to processing static and streaming data together as one?

Similar question has been asked before: Spark : How to append to cached rdd?

Upvotes: 0

Views: 65

Answers (1)

nairbv
nairbv

Reputation: 4323

Have a look at http://spark.apache.org/streaming/.

With spark streaming, you get a data structure representing a collection of RDDs you can iterate over. It can listen to a kafka queue, file system, etc to find new data to include in the next RDD.

Or if you only do these "appends" rarely, you can union two RDDs with the same schema to get a new combined RDD.

Upvotes: 1

Related Questions