Reputation: 4050
This code:
#include <iostream>
#include <chrono>
#include <functional>
#include <time.h>
int main() {
time_t b4 = time(NULL);
for (int i = 0; i < 50000; i++)
std::cout << i << " ";
std::cout << std::endl;
time_t a4 = time(NULL);
std::cout << "Time taken is " << difftime(a4, b4);
getchar();
}
in Windows when compiled/built/run with Visual Studio with commands:
CL.exe /c /Zi /nologo /W3 /WX- /diagnostics:column /sdl /O2 /Oi /GL /D _MBCS /Gm- /EHsc /MD /GS /Gy /fp:precise /permissive- /Zc:wchar_t /Zc:forScope /Zc:inline /FA /Fa"x64\Release\\" /Fo"x64\Release\\" /Fd"x64\Release\vc142.pdb" /Gd /TP /FC /errorReport:prompt ..\src\console_printf.cpp
console_printf.cpp
Link:
link.exe /ERRORREPORT:PROMPT /OUT:"Release\windows.exe" /NOLOGO kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /MANIFEST /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /manifest:embed /DEBUG:FULL /PDB:"Release\windows.pdb" /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /LTCG:incremental /TLBID:1 /DYNAMICBASE /NXCOMPAT /IMPLIB:"Release\windows.lib" /MACHINE:X64 x64\Release\console_printf.obj
finally prints (after printing ... 49998 49999
)
Time taken is 15
The same code when compiled/built/run on Linux with:
g++ -c -O2 -MMD -MP -MF "build/Release/GNU-Linux/_ext/511e4115/console_printf.o.d" -o build/Release/GNU-Linux/_ext/511e4115/console_printf.o ../src/console_printf.cpp
mkdir -p dist/Release/GNU-Linux
finally prints (after printing ... 49998 49999
)
Time taken is 1
That is, console/terminal printing in Linux is just much faster. Both tests were with optimizations turned on in release mode. Although tests were done on two separate machines (one running Windows/Visual Studio, the other running Linux), the computing powers of both are comparable.
Is there a way to get Windows console printing as fast as Linux? I run a numerically intensive/iterative code which periodically displays progress on the console and I am now worried that unnecessarily Windows console printing might be messing up with the recorded time for no fault of the algorithm but because Windows console printing is unwittingly the bottleneck.
Upvotes: 1
Views: 492
Reputation: 2214
Your standard library implementation may be part of your problem. I ran the following code with plain vanilla Visual C++:
#define WRITE_CONSOLE_API
#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include <chrono>
#include <functional>
#include <time.h>
#include <windows.h>
int main() {
LARGE_INTEGER freq;
QueryPerformanceFrequency(&freq);
LARGE_INTEGER start;
LARGE_INTEGER stop;
std::ios_base::sync_with_stdio(true);
#ifdef WRITE_CONSOLE_API
char buf[20];
static char buf2[2] = { '\r', '\0' };
std::uninitialized_fill_n(buf, 20, '\0');
auto con = GetStdHandle(STD_OUTPUT_HANDLE);
DWORD l;
DWORD lr;
#endif
QueryPerformanceCounter(&start);
#ifdef WRITE_CONSOLE_API
for (int i = 0; i < 50000; i++)
{
if (i)
WriteConsoleA(con, buf2, 1, &lr, NULL);
_itoa(i, buf, 10);
l = strlen(buf);
WriteConsoleA(con, buf, l, &lr, NULL);
}
buf2[0] = '\n';
WriteConsoleA(con, buf2, 1, &lr, NULL);
#else
for (int i = 0; i < 50000; i++)
std::cout << '\r' << i;
std::cout << std::endl;
#endif
QueryPerformanceCounter(&stop);
double diff = stop.QuadPart - start.QuadPart;
std::cout << "Time taken is " << diff / freq.QuadPart << " secs\n";
std::cin.ignore(1);
}
with WRITE_CONSOLE_API
defined (where it used the Windows API call WriteConsole
) and also with it not defined (where it used std::cout
).
With WRITE_CONSOLE_API
defined, the result was
Time taken is 2.12448 secs
With WRITE_CONSOLE_API
not defined, the result was
Time taken is 6.25676 secs
if you use a space instead of \r
(i.e. to force the console window to scroll), you get
Time taken is 3.02435 secs
with WRITE_CONSOLE_API
defined, and
Time taken is 7.27557 secs
with WRITE_CONSOLE_API
not defined. Scrolling appears to consistently add 1 second to both times.
You should try this on your own machine, because the timings may vary.
I had debugging on, so NO optimization. With optimization, the standard library version was reduced to 6.8 seconds (Scrolling) and 5.6 seconds (nonscrolling), but the Windows API version didn't change.
If you truly want to separate the program's actual work from the vagaries of the operating system, you could create a thread to do the work, and use the other thread to write progress to the console. You really only need to connect them with the actual progress count, as a std::atomic<
some_int_type
>
).
Upvotes: 3
Reputation: 57698
If you want to improve console I/O (which is the bottleneck for most I/O bound applications) print to a buffer then block write the buffer to a console.
#include <string>
#include <iostream>
#include <sstream>
int main ()
{
std::string buffer;
buffer.reserve(5000);
std::ostringstream number_stream(buffer);
for (unsigned int i = 0; i < 50000; ++i)
{
number_stream << i << " ";
}
number_stream << "\n";
const unsigned int length = buffer.length();
std::cout.write(buffer.c_str(), length);
return 0;
}
The above code uses a std::string
for its buffer. All the numbers are formatted (human readable) into the buffer. The buffer is then written to the console using a block write.
The idea behind buffer.reserve()
is to allocate a large enough buffer to reduce the reallocations.
Upvotes: 2