AB test design in eCommerce - group split per user and statistic aggregation per item

Question

Is it statistically correct/viable to run an A/B test where the A/B group split is per user and then the statistic is aggregated per item?

Lets narrow down the issue into a specific example:

Setting: An online shop where multiple companies post items that users can purchase. Companies can purchase addons that boost item positioning on the website.
Goal: Increase the fraction of items that reached a specific addon click-through target (e.g. 5%)
Test: We're dealing with a binomial distribution so Fisher's exact test was chosen. Especially that there aren't that many items so the test shouldn't be computationally exhausting

Example data:
Addon click-through target: 5%

	Count items that reached target	Count items that didn't reach target
group A	5216	1295
group B	5558	953

Fisher's exact p-value is less than 0.0001 -> results statistically significant for alpha=0.05.

My concern is that such methodology (group split by users, aggregation by items) violates some assumptions of AB test design and theory. We ran 500 AA fisher exact tests with alpha=0.05 and out of those 500 simulations only 0.012 were statistically significant.

I tried looking online for articles that employed such methodology but I was unable to find relevant sources given the overflow of "AB test tutorials" (maybe my search skills suck). I asked GenAI and the model doesn't seem to have a problem with such approach but... it's GenAI.

Can anyone elaborate on this? Any relevant sources or links?

AB test design in eCommerce - group split per user and statistic aggregation per item

Answers (0)

Related Questions