Reputation: 1368
Basically I have Spark sitting in front of a database and I was wondering how I would go about having the dataframe be constantly updated with new data from the backend.
The trivial way I can think of solving this would be to just run the query against the database every couple minutes but that is obviously inefficient and would still result in having stale data for the time between updates.
I am not 100% sure if the database I'm working with has this restriction but I think rows are only added, there are no modifications to existing rows.
Upvotes: 0
Views: 568
Reputation: 1257
DF is RDD+Schema+Many other functionalities. From basic spark design, RDD is immutable. Hence, you can not update a DF after it is materialized. In your case, you can probably mix a streaming + SQL like below:
Upvotes: 1