Reputation: 1493
I am developing a ETL process that extract business data from one database to a data warehouse. The application is NOT using NHibinate, Linq to Sql or Entity Framework. The application has its own generated data access classes that generate the necessary SQL statements to perform CUID.
As one can image, developers who write code that generate custom SQL can easily make mistakes.
I would like to write a program that generate testing data (Arrange), than perform the ETL process (Act) and validate the data warehouse (Assert).
I don't think it is hard to write such program. However, what I worry is that in the past my company had attempt to do something similar, and ending up with a brunch of un-maintainable unit tests that constantly fail because of many new changes to the database schema as new features are added.
My plan is to write an integration test that runs on the build machine, and not any unit tests to ensures the ETL process works. The testing data cannot be totally random generate because of business logic on determine how data are loaded to the data warehouse. We have custom development tool that generates new data access classes when there is a change in the database definition.
I would love any feedback from the community on giving me advice on write such integration test that is easy to easy to maintain. Some ideas I have:
Save a backup testing database in the version control (TFS), developers will need to modify the backup database when there are data changes to the source or data warehouse.
Developers needs to maintain testing data though the testing program (C# in this case) manually. This program would have a basic framework for developer to generate their testing data.
When the test database is initialize, it generate random data. Developers will need to write code to override certain randomly generated data to ensure the test passes.
I welcome any suggestions Thanks
Upvotes: 3
Views: 920
Reputation: 11
If one wants to maintain data while performing Data integration testing in ETL, We could also go with these steps because Integration testing of the ETL process and the related applications involves in them. for eg:
1.Setup test data in the source system. 2.Execute ETL process to load the test data into the target. 3.View or process the data in the target system. 4.Validate the data and application functionality that uses the data
Upvotes: 0
Reputation: 20044
First, let's say I think that's a good plan, and I have done something similar using Oracle & PL/SQL some years ago. IMHO your problem is mainly an organizational one, not a technical:
Upvotes: 0
Reputation: 597
Hey dsum, allthough I don't really know your whole architecture of the ETL, I would say, that integration-testing should only be another step in your testing process.
Even if the unit-testing in the first encounter ended up in a mess, you should keep in mind, that for many cases a single unit-test is the best place to check. Or do you want to split the whole integration test for triple-way case or sth. other further deep down, in order to guarantee the right flow in every of the three conditions?
Messy unit-test are only the result of messy production code. Don't feel offended. That's just my opinion. Unit-tests force coders to keep a clean coding style and keep the whole thing much more maintainable.
So... my goal is, that you just think about not only to perform integration testing on the whole thing, because unit-tests (if they are used in the right way) can focus on problems in more detail.
Regards, MacX
Upvotes: 1