Akhil
Akhil

Reputation: 646

Fast Way of Writing Data to Firestore from BigQuery

I have requirement to load about 10 million data from BigQuery to firestore every day. Which is the fastest way to do this ?

Cloud function with parallel individual writes is an option (according to the below link), but in this case, parallelizing the bigquery table would be challenge.

What is the fastest way to write a lot of documents to Firestore?

Does Dataflow works in this scenario, read and write data through Dataflow ?

Upvotes: 1

Views: 736

Answers (1)

Pablo
Pablo

Reputation: 11041

Dataflow works in this case. It lets you parallelize how you read data from BigQuery and write it into Firestore.

There is a work-in-progress to add a Firestore sink to Beam. It should be available for the Java SDK in Beam 2.31.0: See https://github.com/apache/beam/pull/14261

In the meantime, you may be able to roll your own: In Python it would be like so:

(p 
 | ReadFromBigQuery(...)
 | GroupIntoBatches(50)  # Batches of 50-500 elements will help with throughput
 | ParDo(WriteToFirestoreDoFn())

Where you write your own WriteToFirestoreDoFn that does something like this:

class WriteToFirestoreDoFn(DoFn):
  def __init__(self, firestore_info):
    self.client = None
    self.firestore_info = firestore_info
  
  def process(self, batch):
    if not self.client:
      self.client = firestore.Client(self.firestore_info)
    self.client.write_data(batch)

This is a little pseudocody, but it should help you get started with what you want.

Upvotes: 1

Related Questions