Reputation: 11
I want to load data from On Premise SQL SERVER to blob storage with copy activity in ADF, the target file is parquet, the size of this one is 5 Gb.
The pipeline work well and he wrote one parquet file, now i need to split this file in multiple parquet file to optimise loading data with Poly base and for another uses.
With Spark we can partition file in multiple file by this syntaxe :
df.repartition(5).write.parquet("path")
Upvotes: 0
Views: 7175
Reputation: 3209
Short question, short answer.
Partitioned data: https://learn.microsoft.com/en-us/azure/data-factory/how-to-read-write-partitioned-data
Parquet format: https://learn.microsoft.com/en-us/azure/data-factory/format-parquet
Blob storage connector: https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-blob-storage
Hope this helped!
Upvotes: 1