Have a huge database based on a specific design (for performance tests)

Question

I thank you for your time.

I know there are threads on StackOverflow that look similar, but none of them are answering my question.

Given a design of a database, is there a tool that can fulfill it with huge values based on that design(for future perfromance tests).
Or should I create a server side script that put values randomly?

In fact, I am hesitant between 3 designs of databases and I should test the performance of each one when they have the same amount of data.

Any suggestions and clarifications are highly appreciated.

griffin · Accepted Answer

If you want to do benchmarking of different things, you should always try to have the common context be as large as possible. In your case this means: Don't use random datasets, but the same datasets for testing different implementations/databases/... If you wan't pseudo-random data you can still achieve this by using a seeded random or just generate the set once and then use the same set multiple times.
There are automation tools out there, but as your scheme would have to be input any way it's most probably a better idea to create a simple script. The script would have the advantage of being more flexible as well as it being possible to use it in automated testing. E.g. you could write an ANT script which calls your script to generate a dataset, then runs a list of queries while timing them, outputs the results, and have the same repeated over different configurations.
When testing different db schemes for performance, you should always make sure to also account for differences in code/access etc. For example, if one scheme could be faster by 10% in synthetic tests, it still could be slower by 50% in your application, just because the code you would need to access data would then blow up to be 10 times more complicated. That is also why most of the time it doesn't make much sense to benchmark a scheme by itself, isolated from the application/code/... which will be using it.
Make sure to spend an equal amount of time on optimizing each tested scheme afterwards, as after optimizations (indexes etc) the results could turn around completely.

Have a huge database based on a specific design (for performance tests)

Answers (1)

Related Questions