Reputation: 9
I'm Processing info in Google Cloud Dataflow, we tried to use JPA to insert or update the data into our mysql database, but these queries shouted down our server. So we've decided to change our paths...
I want to generate a mysql or .sql file so we can write the new info processed through dataflow. I want to know if there is an implemented way to do so, or do I have to do this by myself?
Let me explain a little more, we have an input from an XML, we process the info into java classes, we have a json dump of the db, so we can see what we have online without making so much calls, with this in mind, we compare the new info with the info we already have, and we decide if it's new or if it's just an update.
How can I do this via Java/Maven? I need code to generate this file...
Upvotes: 0
Views: 350
Reputation: 1729
Yes, Cloud Dataflow processes data in parallel on many machines. As such, it is not very surprising that other services may not be able to keep up or that some quotas are hit.
Depending on your specific use case, you may be able to slow/throttle Dataflow down without changing your approach. One might limit the number of workers, limit parallelism, use IntraBundleParallelization
API, etc. This might be a better path, overall. We are also working on more explicit ways to throttle Dataflow.
Now, it is not really feasible for any system to automatically generate a .sql
file for your database. However, it should be pretty straightforward to use primitives like ParDo
and TextIO.Write
to generate such a file via a Dataflow pipeline.
Upvotes: 2