user1810575
user1810575

Reputation: 831

Standard data validation practices for multiple partitions & multiple loads in a single day

Looking for data validation technique between layers.

Here is the data flow

Source(RDBMS) > flat file(Stage) > AVRO/json(final destination) on Azure.

Problem is, there could be multiple flat files(partition) for single table at each stage and from there could be more potentially more partitions on destination.

Plan is to create SQL table with bunch columns but not sure how to handle partitions, multiple job loads.

Here is the basic table idea..

Data validation(table): dt_validation JobId|tblname|RC_RDBMS|RC_FF|RC_AVRO|Job_run_date|Partition_1|Partition_2

RC= RowCount, FF=Flat file Note: Idea is each time i pass thru layer, i'll get the rowcount(RC) and insert/update the table.

Does above table design work for multiple partitions, multiple loads/jobs in a single day?

Need suggestions on how my table should look like considering partitions & multiple loads in a single day.

Upvotes: 0

Views: 64

Answers (0)

Related Questions