Reputation: 3192
I've already posted this question here, but since it's maybe not that Qt-specific, I thought I might try my chance here as well. I hope it's not inappropriate to do that (just tell me if it is).
I’ve developed a small scientific program that performs some mathematical computations. I’ve tried to optimize it so that it’s as fast as possible. Now I’m almost done deploying it for Windows, Mac and Linux users. But I have not been able to test it on many different computers yet.
Here’s what troubles me: To deploy for Windows, I’ve used a laptop which has both Windows 7 and Ubuntu 12.04 installed on it (dual boot). I compared the speed of the app running on these two systems, and I was shocked to observe that it’s at least twice as slow on Windows! I wouldn’t have been surprised if there were a small difference, but how can one account for such a difference?
Here are a few precisions:
I’m bothered that the app is so mush slower (2 to 4 times) on Windows, and it’s really weird. On the other hand I haven’t tried on other computers with Windows yet. Still, do you have any idea why the difference?
Additional info: some data…
Even though Windows seems to be using the two cores, I’m thinking this might have something to do with threads management, here’s why:
Sample Computation n°1 (this one launches 2 QThreads):
Sample Computation n°2 (this one launches 3 QThreads):
Sample Computation n°3 (this one launches 6 QThreads):
where:
(Of course, it's not shocking that PC2 is faster. What's incredible to me is the difference between PC1-windows and PC1-linux).
Note: I've also tried running the program on a recent PC (4 or 8 cores @~3Ghz, don't remember exactly) under Mac OS, speed was comparable to PC2-linux (or slightly faster).
EDIT: I'll answer here a few questions I was asked in the comments.
I just installed Qt SDK on Windows, so I guess I have the latest version of everything (including MinGW?). The compiler is MinGW. Qt version is 4.8.1.
I use no optimization flags because I noticed that they are automatically used when I build in release mode (with Qt Creator). It seems to me that if I write something like QMAKE_CXXFLAGS += -O1, this only has an effect in debug build.
Lifetime of threads etc: this is pretty simple. When the user clicks the "Compute" button, 2 to 6 threads are launched simultaneously (depending on what he is computing), they are terminated when the computation ends. Nothing too fancy. Every thread just does brutal computations (except one, actually, which makes a (not so) small"computation every 30ms, basically checking whether the error is small enough).
EDIT: latest developments and partial answers
Here are some new developments that provide answers about all this:
I wanted to determine whether the difference in speed really had something to do with threads or not. So I modified the program so that the computation only uses 1 thread, that way we are pretty much comparing the performance on "pure C++ code". It turned out that now Windows was only slightly slower than Linux (something like 15%). So I guess that a small (but not unsignificant) part of the difference is intrinsic to the system, but the largest part is due to threads management.
As someone (Luca Carlon, thanks for that) suggested in the comments, I tried building the application with the compiler for Microsoft Visual Studio (MSVC), instead of MinGW. And suprise, the computation (with all the threads and everything) was now "only" 20% to 50% slower than Linux! I think I'm going to go ahead and be content with that. I noticed that weirdly though, the "pure C++" computation (with only one thread) was now even slower (than with MinGW), which must account for the overall difference. So as far as I can tell, MinGW is slightly better than MSVC except that it handles threads like a moron.
So, I’m thinking either there’s something I can do to make MinGW (ideally I’d rather use it than MSVC) handle threads better, or it just can’t. I would be amazed, how could it not be well known and documented ? Although I guess I should be careful about drawing conclusions too quickly, I’ve only compared things on one computer (for the moment).
Upvotes: 17
Views: 11487
Reputation: 80
I have noticed exactly the same behavior on my PC. I am running Windows 7(64bits), Ubuntu (64bits) and OSX (Lion 64bits) and my program compares 2 XML files (more than 60Mb each). It uses Multithreading too (2 threads) :
-Windows : 40sec
-Linux : 14sec (!!!)
-OSX : 22sec.
I use a personal class for threads (and not Qt one) which uses "pthread" on linux/OSX and "threads" on windows. I use Qt/mingw compiler as I need the XML class from Qt.
I have found no way (for now) to have the 3 OS having similar performances... but I hope I will !
I think that another reason may be the memory : my program uses about 500Mb of RAM. So I think that Unix is managing it best because, in mono-thread, Windows is exactly 1.89 times slower and I don't think that Linux could be more than 2 times slower !
Upvotes: 1
Reputation: 12866
It's probably the memory allocator, try using jemalloc or tcmalloc from Google. Glibc's ptmalloc3 is significantly better than the old crusty allocator in MSVC's crt. The comparable option from Microsoft is the Concurrency CRT but you cannot simply drop it in as a replacement.
Upvotes: 0
Reputation: 225
You might experience performance differences by how mutexes run on Windows and Linux.
Pure mutex code on windows can have a 15ms wait every time there is a contention for resource when locking. Better performing synchronization mechanism on Windows is Critical Sections. It doesn't experience the locking penalty that regular mutexes experience in most cases.
I have found that on Linux, regular mutexes perform the same as Critical Sections on Windows.
Upvotes: 0
Reputation: 5555
I have heard of one case where Windows was extremely slow with writing files if you do it wrongly. (This has nothing to do with Qt.)
The problem in that case was that the developer used a SQLite database, wrote some 10000 datasets, and did a SQL COMMIT
after each insert. This caused Windows to write the whole DB file to disk each time, while Linux would only update the buffered version of the filesystem inode in the RAM. The speed difference was even worse in that case: 1 second on Linux vs. 1 minute on Windows. (After he changed SQLite to commit only once at the end, it was also 1 second on Windows.)
So if you're writing the results of your computation to disk, you might want to check if you're calling fsync()
or fflush()
too often. If your writing code comes from a library, you can use strace
for this (Linux-only, but should give you a basic idea).
Upvotes: 0
Reputation: 728
Another option it could be: on linux qt are just loaded, this could happens i.e. if you use KDE, while in Windows library must be loaded so this slow down computation time. To check how much library loading waste your application you could write a dummy test with pure c++ code.
Upvotes: 4