Map write operation to groups of Dataframe rows to different delta tables

Question

I have a Dataframe with rows which will be saved to different target tables. Right now, I'm finding the unique combination of parameters to determine the target table, iterating over the Dataframe and filtering, then writing.

Something similar to this:

df = spark.load.json(directory).repartition('client', 'region')

unique_clients_regions = [(group.client, group.region) for group in df.select('client', 'region').distinct().collect()]

for client, region in unique_clients_regions:
  (df
   .filter(f"client = '{client}' and region = '{region}'")
   .select(
     ...
   )
   .write.mode("append")
   .saveAsTable(f"{client}_{region}_data") 
  )

Is there a way to map the write operation to different groupBy groups instead of having to iterate over the distinct set? I made sure to repartition by client and region to try and speed up performance of the filter.

Map write operation to groups of Dataframe rows to different delta tables

Answers (1)

Related Questions