Reputation: 25366
I'm new to threading and was wondering if it's bad to spawn a lot of threads for various tasks (in a server environment). Do threads take up a lot more memory/cpu compared to more linear programming?
Upvotes: 8
Views: 4064
Reputation: 654
Since you're new to threading, there's something else worth bearing in mind - which I prefer to view as parallel scoping of values.
With traditional linear/sequential programming for any given object you only have one thread accessing and changing a piece of data. This is generally made safe due to having lexical scope. Specifically one function can operate on a variable's value without affect a global value. If you don't have lexical scope, or poor lexical scoping, then changing the value of a variable named "foo" in one function affects another called "foo". It's less commonly a problem these days, but still common enough to be alluding to.
With threading, you have the same issue, in more subtle ways. Whilst you still have lexical scoping helping you - in that a local value "X" inside one function is independent of another local value called "X" in another, the fact that data structures are mutable is a major source of bugs in threading.
Specifically, if a function is passed a mutable value, then in a threaded environment unless you have taken care, that function cannot guarantee that the value is not being changed by anything else.
This shared state is the source of probably 90-99% of bugs in threaded systems, and can be very hard to debug. As a result, if you're going to write a threaded system you should try to bear in mind the distance that your shared values will travel - ie the scope of parallel access.
In order to limit bugs you have a handful of options which are known to work:
1 is most equivalent to unix pipelines. 3 is logically equivalent to version control, and is normally referred to as software transactional memory.
1 & 3 are modes of concurrency supported in Kamaelia which aims to eliminate bugs caused by class 2. (disclosure, I run the Kamaelia project) 2 isn't supported because it relies on "always getting everything right".
No matter which approach you use to solve your problem though, bearing in mind this issue, and the ways of dealing with it, and planning upfront how you intend to deal with it will save you no end of headaches later on.
Upvotes: 6
Reputation: 882701
Excellent answers all around! I just wanted to add that, if you decide to go with a thread pool (generally advisable if you decide that threads are suitable for your app), you would be well advised to reuse (and possibly adapt) an existing general-purpose implementation, such as Christopher Arndt's, rather than roll your own from scratch (which is always an instructive undertaking, to be sure, but less productive in terms of time it takes to get you properly working and debugged code;-).
Upvotes: 2
Reputation: 198827
You might consider using microthreads if you need concurrency. There's a good article on the subject here. The advantage is that you're not creating "real" threads that eat up resources and cause context switching. Of course the downside is that you're not taking advantage of multicore technology.
This is the approach Kamaelia and stackless take.
If you're doing I/O, you might consider using asynchronous I/O as well. This can be a real pain to program, but it prevents you from having threads fight over CPU time. Unfortunately, I don't know of any platform independent way to do this in Python.
Upvotes: 3
Reputation: 54935
Threads do have some CPU and memory overhead but unless you spawn hundreds or thousands of them, it usually isn't all that significant. The more important issue is that threads make correct programming a lot more difficult if you share any writable datastructures between concurrent threads. See the paper The Problem with Threads for an explanation why they aren't a good abstraction for concurrent programming.
Upvotes: 2
Reputation: 9215
You have to consider multiple things if you want to use multiple threads:
The conclusion I take from that:
Upvotes: 20
Reputation: 5994
It depends on the platform.
Windows threads have to commit around 1MB of memory when created. It's better to have some kind of threadpool than spawning threads like a madman, to make sure you never allocate more than a fixed amount of threads. Also, when you work in Python, you're subject to the Global Interpreter Lock which hinders code that rely on lots of concurrent threads.
On Unix, you may consider using different processes instead of threads, or look at other asynchronous way of doing things (the Twisted server framework has interesting ways of handling asynchronous network tasks, and if you feel really adventurous you can take a look at stackless Python, a continuation framework which don't really use kernel threads at all).
Upvotes: 5