Reputation: 11
I have a requirement that we don't want to change the schema of the dataset in our destination (which is ADLS) when there is a change in source (SAP BW).
I have created a mapping data flow with a source having the SAP CDC dataset which has ODPNAME
, ODPContext
, loadType
, and runMode
parameterized.
I disabled the "Allow schema drift"
option and tried two things first enable "Validate schema" and 2nd disable all options.
But when I went to validate or publish the change it started throwing this error:
Schema should be defined in the dataset or enable allow schema drift
So, I went to define a schema in the dataset but there is no option for the schema in the SAP CDC dataset.
Could anyone please help me here I am stuck with this issue?
Also, the projection option is not helping as the pipeline is parameterized the ODPName and ODPContext are taken at runtime
. Or if I am missing something then please do let me know.
Upvotes: 0
Views: 58
Reputation: 11454
The validate schema option will not work in case of parameterized and dynamically changing schema. It can be used when you imported the projection, and the data changed later during the dataflow run.
In this case, you can use assert transformation to fail the dataflow. Follow the below dataflow design.
After all of your transformations on your source, take a derived column and create a new column source_col_names
and give the expression ColumnNames()
like below.
Now, take another source and give your target ADLS dataset for this. Similarly, take another derived Column transformation and create a new column target_col_names
with same expression ColumnNames()
.
Now, compare both of these array columns using assert transformation condition like below.
Upon derivedColumn2 take assert transformation and add derivedColumn1 transformation in the Additional streams. Check on Fail data flow and select Expect exists. Give any description and in the expression select the source_col_names
and target_col_names
from both transformations.
If first row of source_col_names
from derivedColumn2 i.e., the column names array of the source does not exist in the target column names array, the dataflow will fail. Otherwise, this will go to the next transformation.
Next, use a select transformation Rule-based mapping, to remove the extra created columns like below.
(name!='target_col_names')&&(name!='source_col_names')
Add your target dataset again in the sink. If the schema is same, it will copy the source data to the target file. Otherwise, the dataflow will fail like below.
Upvotes: 0