Reputation: 18066
The Cloud Dataflow page implicates that this would be possible, but I haven't found a way of observing change events in the Google Cloud Datastore docs. How is it done?
Upvotes: 4
Views: 1242
Reputation: 8178
As far as I am aware, the integration of Cloud Datastore with Dataflow is through DatastoreIO (now based on DatastoreV1), which can only be used as a bounded source for batch jobs.
I have been trying to find an alternative solution that would allow you to use Datastore (directly or indirectly) as an unbounded source (for instance creating a Pub/Sub topic where Datastore changes are published and can be consumed from Dataflow), but I do not think that would be a viable solution given that, as you said, there is no easy way to detect changes (addition of entities, modification of entities, etc.) in Datastore.
For now, I have filed an internal request to improve the documentation to either modify the image so that it does not imply that Cloud Datastore can be used with a Streaming Pipeline, or clarify this use case.
Upvotes: 2