Michael Ekoka
Michael Ekoka

Reputation: 20088

WSGI: making each request truly unique

I'm currently familiarizing myself with the WSGI specifications for web applications in Python. I set up Apache (with mod-wsgi) to call a small application that currently just displays the thread id number, in an attempt to observe the uniqueness of each request:

import thread

def application(environ, start_response)
    start_response('200 Ok', [('Content-type', 'text/plain')])
    output = "current thread id: %s" % thread.get_ident()
    return [output]

I soon noticed that after a little while, the same threads are being reused by subsequent requests.

If my understanding is correct, in order for my application to have "context-specific" variables, I need to store them with a scheme similar to this:

lock = thread.allocate_lock()
lock.acquire()
thread_id = get_ident()
threadsafe[thread_id]['user'] = request.username
lock.release()

I can then access them from different part of the application in a similar fashion. The only guarantee that I have in this case is that the value belongs to that specific thread. However, requests using that same thread might still step on each other's toes (e.g. a request accessing left-over values from a previous request). My conclusion is that to handle each request in a truly unique fashion, in addition to the "thread_id", I'll need another key that can differentiate between requests that use the same thread.

Using a unique key such as uuid, I could do this

lock.acquire()
uuid = uuid.uuid4()
thread_id = get_ident()
threadsafe[(thread_id, uuid)]['user'] = request.username
lock.release()

but this implies that I have a way to also retrieve the uuid value in a thread-safe way, the same way I can retrieve the thread_id later.

Did I draw the right conclusions? If so, how do I get that additional key?

Edit

It just occured to me that my problem is a false dichotomy. I'm approaching things with the perspective that a thread could be running concurrently to itself, when in fact this is not possible. Requests using the same thread, would have to run in series. Therefore, I could actually use the uuid to avoid using the thread's stale values, but only after storing it as a thread-save value itself.

# somewhere early in the request
threadsafe[thread_id]['current_uuid'] = uuid.uuid4()

# later
lock.acquire()
thread_id = get_ident()
uuid = threadsafe[thread_id]['current_uuid']
threadsafe[(thread_id, uuid)]['user'] = request.username
lock.release()

Upvotes: 1

Views: 1728

Answers (2)

jdi
jdi

Reputation: 92569

This answer is based off of new information that developed in the comments of @user590028's answer.

You said that your goal is to have thread-safe persistant data. Because you also said you are familiarizing yourself with the WSGI specs, I feel this link is particularly relavant: Application_Global_Variables

...although global data can be used, it can only be used to cache data which can be safely reused within the context of that single process. You cannot use global data as a means of holding information that must be visible to any request handler no matter which process it runs in.

Your application may not only be running under multiple threads, but potentially multiple processes. As per the above link, the recommended solution for persistant data (beyond that of the current request) is to use an external storage solution (filesystem, database, memcached, ...)

Update

What you are trying to do with locks in order to save state information seems completely unnecessary. Every request should be considered unique no matter what. If a client side user makes 10 requests to your application, and you want to persist data across those requests, then you should be using a session key like a cookie that you first establish to a client when their request is new (contains no session), and then you return it in the response and expect future requests to provide this key. Subsequently, there are libraries that aim to provide this functionality for you: http://www.ollycope.com/software/pesto/session.html

A wsgi application has an entry point, in these case your example defines it as a function called "application". It could also have been a class or anything callable. Your variables are context specific by nature because of the scope. Whatever you do with that scope is completely different from any other threads running that same handler. The "application" function could have been more complex, calling other functions and passing its variables around until ultimately returning its response body. You could also have created a class instance that contains all the functionality needed to process the request and generate the response, and making use of its own instance variables.

And if neither of these previous two suggestions apply to what you are asking, I see the only remaining possibility being that you do actually want to store your data in a database, or filesystem, or memcached, or redis, and so on. uuid4 will be unique, but its value only has meaning if you pass it on in the response and have the client return it to remain associated to that data.

Upvotes: 1

user590028
user590028

Reputation: 11730

You are right. Thread id's are not guaranteed to be unique over time. Consider UUID's. Something like str(uuid.uuid4())

Upvotes: 0

Related Questions