Reputation: 646
I have requirement to load about 10 million data from BigQuery to firestore every day. Which is the fastest way to do this ?
Cloud function with parallel individual writes is an option (according to the below link), but in this case, parallelizing the bigquery table would be challenge.
What is the fastest way to write a lot of documents to Firestore?
Does Dataflow works in this scenario, read and write data through Dataflow ?
Upvotes: 1
Views: 736
Reputation: 11041
Dataflow works in this case. It lets you parallelize how you read data from BigQuery and write it into Firestore.
There is a work-in-progress to add a Firestore sink to Beam. It should be available for the Java SDK in Beam 2.31.0: See https://github.com/apache/beam/pull/14261
In the meantime, you may be able to roll your own: In Python it would be like so:
(p
| ReadFromBigQuery(...)
| GroupIntoBatches(50) # Batches of 50-500 elements will help with throughput
| ParDo(WriteToFirestoreDoFn())
Where you write your own WriteToFirestoreDoFn
that does something like this:
class WriteToFirestoreDoFn(DoFn):
def __init__(self, firestore_info):
self.client = None
self.firestore_info = firestore_info
def process(self, batch):
if not self.client:
self.client = firestore.Client(self.firestore_info)
self.client.write_data(batch)
This is a little pseudocody, but it should help you get started with what you want.
Upvotes: 1