Reputation: 8025
If Yelp wanted to understand if ratings helped users pick a listing, and we use the CTR as the success metric to run the ab test, how do we know that a significant change in CTR is due to just the ratings and not other parts of the listing like the reviews?
Do we have to do some kind of user segmentation instead of randomly assigning users before running the ab test?
Upvotes: 1
Views: 240
Reputation: 503
You generally want to trust randomization for most experiments. Randomization is an unbiased process that with enough users, controls for all possible confounding factors, both known (eg. age, gender and OS) and unknown (eg. personality, hair color and sophistication), making comparisons between test and control groups balanced and fair. Since both groups are exposed and measured simultaneously, A/B testing also corrects for temporal and seasonal effects. Statistically significant differences between the test and control groups can be directly attributed to the change being tested. I wrote more about this in a blog post.
Going with a custom user segmentation is usually reserved for rare instances where randomization can be expected to produce unbalanced groups. This is generally rare, but an example is if you split a room of 100 people into two groups but Bill Gates and Elon Musk are in this room. Depending on what metric you want to measure, they can severely mess things up. Randomization will stick both billionaires in the same group half the time. This is a scenario where it's worth doing a custom segmentation and enforcing that they end up in different groups. But this sort of thing is generally rare and rarely affects binary metrics like CTR.
Upvotes: 0
Reputation: 798
Randomization takes care of all other variables but the treatment. Test on statistical significance takes care of the choice between the treatment and chance. It's only when you can't do a randomized trial, that you need to control for other differentiators.
Upvotes: 1