pythonsocketsasynchronousnetwork-programmingc10k

Reputation: 41633

Writing a socket-based server in Python, recommended strategies?

I was recently reading this document which lists a number of strategies that could be employed to implement a socket server. Namely, they are:

Serve many clients with each thread, and use nonblocking I/O and level-triggered readiness notification
Serve many clients with each thread, and use nonblocking I/O and readiness change notification
Serve many clients with each server thread, and use asynchronous I/O
serve one client with each server thread, and use blocking I/O
Build the server code into the kernel

Now, I would appreciate a hint on which should be used in CPython, which we know has some good points, and some bad points. I am mostly interested in performance under high concurrency, and yes a number of the current implementations are too slow.

So if I may start with the easy one, "5" is out, as I am not going to be hacking anything into the kernel.

"4" Also looks like it must be out because of the GIL. Of course, you could use multiprocessing in place of threads here, and that does give a significant boost. Blocking IO also has the advantage of being easier to understand.

And here my knowledge wanes a bit:

"1" is traditional select or poll which could be trivially combined with multiprocessing.

"2" is the readiness-change notification, used by the newer epoll and kqueue

"3" I am not sure there are any kernel implementations for this that have Python wrappers.

So, in Python we have a bag of great tools like Twisted. Perhaps they are a better approach, though I have benchmarked Twisted and found it too slow on a multiple processor machine. Perhaps having 4 twisteds with a load balancer might do it, I don't know. Any advice would be appreciated.

Upvotes: 9

Answers (6)

Douglas Leeder

Reputation: 53310

asyncore is basically "1" - It uses select internally, and you just have one thread handling all requests. According to the docs it can also use poll. (EDIT: Removed Twisted reference, I thought it used asyncore, but I was wrong).

"2" might be implemented with python-epoll (Just googled it - never seen it before). EDIT: (from the comments) In python 2.6 the select module has epoll, kqueue and kevent build-in (on supported platforms). So you don't need any external libraries to do edge-triggered serving.

Don't rule out "4", as the GIL will be dropped when a thread is actually doing or waiting for IO-operations (most of the time probably). It doesn't make sense if you've got huge numbers of connections of course. If you've got lots of processing to do, then python may not make sense with any of these schemes.

For flexibility maybe look at Twisted?

In practice your problem boils down to how much processing you are going to do for requests. If you've got a lot of processing, and need to take advantage of multi-core parallel operation, then you'll probably need multiple processes. On the other hand if you just need to listen on lots of connections, then select or epoll, with a small number of threads should work.

Upvotes: 7

gdamjan

Reputation: 1038

One sollution is gevent. Gevent maries a libevent based event polling with lightweight cooperative task switching implemented by greenlet.

What you get is all the performance and scalability of an event system with the elegance and straightforward model of blocking IO programing.

(I don't know what the SO convention about answering to realy old questions is, but decided I'd still add my 2 cents)

Upvotes: 3

Roboprog

Reputation: 3144

How about "fork"? (I assume that is what the ForkingMixIn does) If the requests are handled in a "shared nothing" (other than DB or file system) architecture, fork() starts pretty quickly on most *nixes, and you don't have to worry about all the silly bugs and complications from threading.

Threads are a design illness forced on us by OSes with too-heavy-weight processes, IMHO. Cloning a page table with copy-on-write attributes seems a small price, especially if you are running an interpreter anyway.

Sorry I can't be more specific, but I'm more of a Perl-transitioning-to-Ruby programmer (when I'm not slaving over masses of Java at work)

Update: I finally did some timings on thread vs fork in my "spare time". Check it out:

http://roboprogs.com/devel/2009.04.html

Expanded: http://roboprogs.com/devel/2009.12.html

Upvotes: 3

Eugene Morozov

Reputation: 15806

Can I suggest additional links?

cogen is a crossplatform library for network oriented, coroutine based programming using the enhanced generators from python 2.5. On the main page of cogen project there're links to several projects with similar purpose.

Upvotes: 2

vartec

Reputation: 134581

http://docs.python.org/library/socketserver.html#asynchronous-mixins

As for multi-processor (multi-core) machines. With CPython due to GIL you'll need at least one process per core, to scale. As you say that you need CPython, you might try to benchmark that with ForkingMixIn. With Linux 2.6 might give some interesting results.

Other way is to use Stackless Python. That's how EVE solved it. But I understand that it's not always possible.

Upvotes: 1

cdleary

Reputation: 71424

I like Douglas' answer, but as an aside...

You could use a centralized dispatch thread/process that listens for readiness notifications using select and delegates to a pool of worker threads/processes to help accomplish your parallelism goals.

As Douglas mentioned, however, the GIL won't be held during most lengthy I/O operations (since no Python-API things are happening), so if it's response latency you're concerned about you can try moving the critical portions of your code to CPython API.

Upvotes: 1

Writing a socket-based server in Python, recommended strategies?

Answers (6)

Related Questions