SmallFry
SmallFry

Reputation: 153

Is it possible to combine multiple input files with different schemas using Schema Drift / Dynamic Columns

I have around 20 tab-separated input files. They have in the region of 500 columns, but each will be slightly different.

The sink output schema is known and will contain all the possible input columns.

As a simplified example:

File 1

Name Age DOB Nationality
Bob 21 01/01/1972 British

File2

Name Nationality NINO
Joe British AA995654A

File 3

Name DOB Nationality
Sam 01/01/1990 British

Is it possible to have one DataFlow with multiple inputs, where the schema is not known until runtime, that would cope with changes in the input files and in this case would output:

Name Age DOB NINO Nationality
Bob 21 01/01/1972 NULL British
Joe NULL NULL AA995654A British
Sam NULL 01/01/1990 NULL British

I have looked at column pattern matching and schema drift, but don't see how/if it is possible to achieve this.

Upvotes: 0

Views: 680

Answers (1)

Mark Kromer MSFT
Mark Kromer MSFT

Reputation: 3838

What you will do is to build a logical model in your data flow using a Derived Column with the common model that you wish to conform your input data to. This video shows an example of achieving this: https://www.youtube.com/watch?v=K5tgzLjEE9Q

Upvotes: 1

Related Questions