Reputation: 1639
I have some performance tests for an index structure on some data. I will be comparing 2 indexes side-by-side (still not decided if I will be using 2 VMs). I require results to be as neutral as possible of course, so I have these kinds of questions which I would appreciate any input about... How can I ensure/control what is influencing the test? For example, caching effects/order of arrival from one test to another will influence the result. How can I measure these influences? How do I create a suitable warm-up? Or what kind of statistical techniques can I use to nullify such influences (I don't think just averages is enough)?
Upvotes: 2
Views: 889
Reputation: 5682
As neutral as possible, then create two databases on the same instance of your database management system, then create the same tablespaces with data, using indexes on one instance but not the other.
The challenge with a VM is you have arbitrated access to your disk resources ( unless you have each VM pinned to a specific interface and disk set ). Because of this, your arbitration model could vary from one test to the next. The most neutral course, which removes the arbitration, is on physical hardware....and the same hardware in both cases.
Upvotes: 0
Reputation: 6328
Before you start:
If you just want a comparison, and not a test under realistic use, I'd just do:
If you're interested in results when there are cache effects: Perform test again 10 times (or any number of times as big as reasonable). Measure each time, to account for variability due to other processes running on the VM, and other contingencies.
Reboot your machine, and repeat the whole process for test2. There are methods to clean the OS cache; but they're very system dependent, and you don't have a way to clean the database cache. Check See and clear Postgres caches/buffers?.
If you are really (or mostly) interested in performance when there are no cache effects, you should perform the whole process several times. It's slow and tedious. If you're only interested in the case where there's (most probably) a cache effect, you don't need to restart again.
Perform an ANOVA (or any other statistical hypothesis test you might think more suited) to decide if your average time is statistically different or not.
You can see an example of performing several tests in the answer to a question about NOT NULL versus CHECK(xx NOT NULL).
Upvotes: 4