Reputation: 2834
In A/B testing, it's fairly common to split based on modulo arthimetic, but that often causes overlapping experiments, i.e. if you used id % 2 == 0 as your split criteria, one set of users would be consistently getting into control or experiment.
A solution I've heard about is to use hashing. I want to concatenate a user_id
with an experiment name, hash it, and then convert that into a float between 0 and 1. I know how to do the hashing (Digest::MD5::hexdigest('test').to_i(16)
) but I'm confused on the next steps for conversion to a float between 0 and 1.
What are the steps?
Upvotes: 1
Views: 842
Reputation: 2834
I figured out the solution by porting the code that's listed here: http://blog.richardweiss.org/2016/12/25/hash-splits.html
test_id_digest = Digest::MD5::hexdigest(user_id + experiment_name)
test_id_first_digits = test_id_digest[0..5]
test_id_final_int = test_id_final_int = test_id_first_digits.to_i(16)
ab_split = test_id_final_int.to_f/0xFFFFFF
The basic idea is to create the digest, then take the first six letters, then divide by the largest six digit hex string.
The blog post referenced goes into verifying the randomness of this solution.
Upvotes: 2