Reputation: 127587
I'm using Python 2.5 on Linux, in multiple parallel FCGI processes. I use
chars = string.ascii_letters + string.digits
cookie = ''.join([random.choice(chars) for x in range(32)])
to generate distinct cookies. Assuming that the RNG is seeded from /dev/urandom, and that the sequence of random numbers comes from the Mersenne twister, I would expect that there is practically zero chance of collision.
However, I do see regular collisions, even though only a few (<100) users are logged in at any time.
Why are the random numbers not more random?
Upvotes: 16
Views: 11090
Reputation: 536775
I don't know how your FCGI processes are being spawned, but is it possible that it's using fork() after the Python interpreter has started (and the random module has been imported by something), hence effectively seeding two processes' random._inst
s from the same source?
Maybe put some debugging in to check that it is correctly seeding from urandom, and not falling back to the less rigorous time-based seed?
eta re comment: man! That's me stumped then; if the RNG always has different state at startup I can't see how you could possibly get collisions. Weird. Would have to put in a lot of state logging to investigate the particular cases which result in collisions, I guess, which sounds like a lot of work trawling through logs. Could it be (1a) the FCGI server usually doesn't fork, but occasionally does (maybe under load, or something)?
Or (3) some higher-level problem such as a broken HTTP proxy passing the same Set-Cookie to multiple clients?
Upvotes: 1
Reputation: 18826
I had to erase my original answer, which suggested that generator is not seeded from /dev/urandom
, since its source (for Python 3.x) clearly says that it is:
def seed(self, a=None):
"""Initialize internal state from hashable object.
None or no argument seeds from current time or from an operating
system specific randomness source if available.
If a is not None or an int or long, hash(a) is used instead.
"""
if a is None:
try:
a = int(_hexlify(_urandom(16)), 16)
except NotImplementedError:
import time
a = int(time.time() * 256) # use fractional seconds
super().seed(a)
self.gauss_next = None
I therefore humbly accept that there are mysteries in the world that I may not be able to decipher.
Upvotes: 0
Reputation: 75567
This is definitely not a normal collision scenario:
Could this be a concurrency issue?
random.Random
instances for each threadthreading.local()
)os.urandom()
- not system time - so you should get different streams for each thread.Upvotes: 4
Reputation: 57554
It shouldn't be generating duplicates.
import random
chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
def gen():
return ''.join([random.choice(chars) for x in range(32)])
test = [gen() for i in range(100000)]
print len(test), len(set(test)) # 100000 100000
The chances of duplicates is significant with chars = "ab"; 126 duplicates in 1000000 iterations. It's nonexistant with 62.
That said, this isn't a good way to generate cookies, because session cookies need to be unpredictable, to avoid attacks involving stealing other people's session cookies. The Mersenne Twister is not designed for generating secure random numbers. This is what I do:
import os, hashlib
def gen():
return hashlib.sha1(os.urandom(512)).hexdigest()
test = [gen() for i in range(100000)]
print len(test), len(set(test))
... which should be very secure (which is to say, difficult to take a string of session cookies and guess other existing session cookies from them).
Upvotes: 13
Reputation: 3287
To avoid the problem, you can use a sequence of cookies, that are guaranteed to be different (you can e.g. use a set). Each time you give a cookie to someone, you take it from the sequence and you add another to it. Another option is to generate a UUID and use that as a cookie.
Another way to avoid the problem could be to hold a private key, and use a (e.g. MD5) checksum of the private key, with a counter value joined to it. The probability for collisions will then be very low. To be safer, add a few more variables to the checksum, like the current time, the ip address of the user, ...
Libraries to generate cookies exist. Any WSGI implementation probably contains a cookie generator.
If you're only interested in how random your strings are, you could generate a file with, say, one million cookies and perform randomness checks on that file. This, however, is not what I would recommend.
Upvotes: -4