Reputation: 353
I am working on a sorting project and I've come to the point where a main bottleneck is reading in the data. It takes my program about 20 seconds to sort 100,000,000 integers read in from stdin using cin
and std::ios::sync_with_stdio(false);
but it turns out that 10 of those seconds is reading in the data to sort. We do know how many integers we will be reading in (the count is at the top of the file we need to sort).
How can I make this faster? I know it's possible because a student in a previous semester was able to do counting sort in a little over 3 seconds (and that's basically purely read time).
The program is just fed the contents of a file with integers separated by newlines like $ ./program < numstosort.txt
Thanks
Here is the relevant code:
std::ios::sync_with_stdio(false);
int max;
cin >> max;
short num;
short* a = new short[max];
int n = 0;
while(cin >> num) {
a[n] = num;
n++;
}
Upvotes: 6
Views: 2967
Reputation: 1
This will get your data into memory about as fast as possible, assuming Linux/POSIX running on commodity hardware. Note that since you apparently aren't allowed to use compiler optimizations, C++ IO is not going to be the fastest way to read data. As others have noted, without optimizations the C++ code will not run anywhere near as fast as it can.
Given that the redirected file is already open as stdin
/STDIN_FILENO
, use low-level system call/C-style IO. That won't need to be optimized, as it will run just about as fast as possible:
struct stat sb;
int rc = ::fstat( STDIN_FILENO, &sb );
// use C-style calloc() to get memory that's been
// set to zero as calloc() is often optimized to be
// faster than a new followed by a memset().
char *data = (char *)::calloc( 1, sb.st_size + 1 );
size_t totalRead = 0UL;
while ( totalRead < sb.st_size )
{
ssize_t bytesRead = ::read( STDIN_FILENO,
data + totalRead, sb.st_size - totalRead );
if ( bytesRead <= 0 )
{
break;
}
totalRead += bytesRead;
}
// data is now in memory - start processing it
That code will read your data into memory as one long C-style string. And the lack of compiler optimizations won't matter one bit as it's all almost bare-metal system calls.
Using fstat()
to get the file size allows allocating all the needed memory at once - no realloc()
or copying data around is necessary.
You'll need to add some error checking, and a more robust version of the code would check to be sure the data returned from fstat()
actually is a regular file with an actual size, and not a "useless use of cat" such as cat filename | YourProgram
, because in that case the fstat()
call won't return a useful file size. You'll need to examine the sb.st_mode
field of the struct stat
after the call to see what the stdin
stream really is:
::fstat( STDIN_FILENO, &sb );
...
if ( S_ISREG( sb.st_mode ) )
{
// regular file...
}
(And for really high-performance systems, it can be important to ensure that the memory pages you're reading data into are actually mapped in your process address space. Performance can really stall if data arrives faster than the kernel's memory management system can create virtual-to-physical mappings for the pages data is getting dumped into.)
To handle a large file as fast as possible, you'd want to go multithreaded, with one thread reading data and feeding one or more data processing threads so you can start processing data before you're done reading it.
Edit: parsing the data.
Again, preventing compiler optimizations probably makes the overhead of C++ operations slower than C-style processing. Based on that assumption, something simple will probably run faster.
This would probably work a lot faster in a non-optimized binary, assuming the data is in a C-style string read in as above:
char *next;
long count = ::strtol( data, &next, 0 );
long *values = new long[ count ];
for ( long ii = 0; ii < count; ii++ )
{
values[ ii ] = ::strtol( next, &next, 0 );
}
That is also very fragile. It relies on strtol()
skipping over leading whitespace, meaning if there's anything other than whitespace between the numeric values it will fail. It also relies on the initial count of values being correct. Again - that code will fail if that's not true. And because it can replace the value of next
before checking for errors, if it ever goes off the rails because of bad data it'll be hopelessly lost.
But it should be about as fast as possible without allowing compiler optimizations.
That's what crazy about not allowing compiler optimizations. You can write simple, robust C++ code to do all your processing, make use of a good optimizing compiler, and probably run almost as fast as the code I posted - which has no error checking and will fail spectacularly in unexpected and undefined ways if fed unexpected data.
Upvotes: 3
Reputation: 4189
Can you ran this little test in compare to your test with and without commented line?
#include <iostream>
#include <cstdlib>
int main()
{
std::ios::sync_with_stdio(false);
char buffer[20] {0,};
int t = 0;
while( std::cin.get(buffer, 20) )
{
// t = std::atoi(buffer);
std::cin.ignore(1);
}
return 0;
}
Pure read test:
#include <iostream>
#include <cstdlib>
int main()
{
std::ios::sync_with_stdio(false);
char buffer[1024*1024];
while( std::cin.read(buffer, 1024*1024) )
{
}
return 0;
}
Upvotes: 0
Reputation: 57678
You may be able to speed up your program by reading the data into a buffer, then converting the text in the buffer to internal representation.
The thought behind this is that all stream devices like to keep streaming. Starting and stopping the stream wastes time. A block read transfers a lot of data with one transaction.
Although cin
is buffered, by using cin.read
and a buffer, you can make the buffer a lot bigger than cin
uses.
If the data has fixed width fields, there are opportunities to speed up the input and conversion processes.
Edit 1: Example
const unsigned int BUFFER_SIZE = 65536;
char text_buffer[BUFFER_SIZE];
//...
cin.read(text_buffer, BUFFER_SIZE);
//...
int value1;
int arguments_scanned = snscanf(&text_buffer, REMAINING_BUFFER_SIZE,
"%d", &value1);
The tricky part is handling the cases where the text of a number is cut off at the end of the buffer.
Upvotes: 0
Reputation: 2442
You can make it faster if you use a SolidState hard drive. If you want to ask something about code performance, you need to post how are you doing things in the first place.
Upvotes: 3