How to load multiple files with overlapping but dynamic schema columns using azure copy activity?

Question

I have around 7-8 files with different but overlapping schemas. However i want the schema of the target (azure synapse table) to be the schema of the largest no of columns.

my files have - 61 columns, 93 columns and 96 columns so my target table will have 96 columns

but while loading the 61 column tables, i want the rest 30 something rows to be loaded as NULL. I am using data flow to load recursively using *.csv option.

Can anyone please let me know how to achieve this?

my sources are csv files in azure BLOB and my target is one single table with the schema of the 96 columns.

(one more catch is, just one of the files has two columns named different but values are same, i dont know how to map this else it will look like two more new column and we will end up having 98 columns)

also i cant do manual mapping because doing that for these many files would take a lot of time.

Please let me know. Thanks in advance

Jay Gong · Accepted Answer

I try to implement your requirement with copy activity in ADF.

Prepare for the test data:

three files in blob storage container.

Column setting in SQL DB:

Then use Get Metadata Activity and For Each Activity in ADF:

Configuration for Get Metadata Activity:

Configuration for For Each Activity:

Inside For Each Activity ,please use copy activity. Set blob storage as source dataset and sql db(i use sql db for test,you could use synapse db here) as sink dataset.

Test Result in SQL DB:

Furthermore,to be honest,i didn't find any way to map just one of the files has two columns named different but values are same. I believe it's not supported by ADF so far. Since you definitely know which column matches this situation, you may deal with them using code later.(which i think it's not too tough)

How to load multiple files with overlapping but dynamic schema columns using azure copy activity?

Answers (1)

Related Questions