Reputation: 67223

Should your test data be in the same form as the live data?

When testing systems (any system, really, e.g. a database), is it important that the test data is in the same form (format) as the live data?

To what degree do you allow differences in the two types of data?

Thanks

Upvotes: 2

Answers (5)

Ethel Evans

Reputation: 755

I try to use both test data that hits specific cases I have designed (often modified from live data); and a significant volume of live data whenever it is available, which hits a large number of scenarios that could definitely impact customers and may include scenarios I haven't thought of.

Keep in mind precisely what you are testing at any given moment. If you are just testing that the data acceptance service grabs files and it should grab any files and then reject bad formats later, then you don't care so much about what is inside the file and you will need at least some other-format test files. In that case, maybe just changing extensions on a notepad file would be enough for the functionality testing, with some large files generated to test file size, etc.

Using non-accurate test data could be especially useful if the format is still being worked out while the devs start work on the other parts of the system. However, you will want to run live or similar-to-live data through every part of your system for integration and end-to-end testing at some point.

Upvotes: 1

JamieDainton

Reputation: 165

I think it's more complex than some people have made out and I would generally have the following test environments

Unit Test - Partial Copy of production data
System Test - Stale but full copy of production data with interfaces from other system test environments
Production Acceptance - Same as system test but fed from other PA systems and may have more data if you use massive data sets
Production maintenance - Copy of production refreshed frequently (e.g weekly) with no interfaces but the ability to implement them quickly. This is used for fixing big production mistakes.

Upvotes: 0

Thomas Owens

Reputation: 116169

I disagree with MusiGenesis, unless you are testing your ability to read from the data source.

If you are just testing how the system performs with certain data, then you can just use mocking to remove all connectivity to external data sources. However, if you need to test things like handling failures in connections and dropping connections, then you will probably want to try to connect to the same type of data source.

Upvotes: 0

Jacob Mattison

Reputation: 51052

Barring specific reasons to use fake data, I think it's important to get as close as you can to the live data when testing. Otherwise you will definitely miss issues.

Specific reasons you might use fake data:

live data has privacy or sensitivity concerns; you might use fake credit card numbers (but with the proper format), you might obfuscate names or phone numbers
live data volume is too high for speedy testing; in this case you should select a representative sample
using live data might cause external impacts; for example, you might not want to use real email addresses if emails could go to real users during tests. However, this last one is better solved by mocking your email system.

Upvotes: 2

MusiGenesis

Reputation: 75276

Put it this way: the more different your test data is from your live data, the less valuable the testing actually is. So yes, your test data should be as close as possible to your live data.

Upvotes: 5

Should your test data be in the same form as the live data?

Answers (5)

Related Questions