Reputation: 4516
I've got an array of char*
in a file.
The company I work for stores data in flat files.. Sometimes the data is sorted, but sometimes it's not.
I'd like to sort the data in the files.
Now I could write the code to do this, from scratch. Is there an easier way?
Of course an in-place sort would be the best option. I'm working on large files and have little RAM. But I'll consider all options.
All strings are the same length.
This is some sample data:
the data is of fixed length
the Data is of fixed length
thIS data is of fixed lengt
This would represent three records of length 28. The app knows the length. Each record ends with CRLF (\r\n
), though it shouldn't matter for this sort.
Upvotes: 4
Views: 7232
Reputation:
You probably want to look into memory mapped files (see http://en.wikipedia.org/wiki/Memory-mapped_file), mmap() function (http://en.wikipedia.org/wiki/Mmap) on POSIX-complaint OSes. You'll essentially get a pointer to contiguous memory representing the file's contents.
The good side is that the OS will take care of loading parts of the file into memory and unloading them again, as needed.
One downside is that you'll need to resolve to some form of file locking to avoid corruption if more than one process is likely to access the file.
Another downside is that this doesn't guarantee good performance - to do that, you'll need a sorting algorithm that tries to avoid constantly loading and unloading pages (unless of course you have enough memory to load the entire file into memory).
Hope this has given you some ideas!
Upvotes: 0
Reputation: 10819
A few things come to mind:
std::sort
, qsort
, and any other general-purpose sorting) always requires O(N log N) time. Radix sorting compares a single digit at a time (starting at str[0]
and ending at str[K-1]
for a K-lenth string), and overall can require only O(N) time to execute.Consult the Internetfor a much better detailed description of radix sorting algorithms than I can provide. Aside from what I've said, I would avoid all of the other solutions that use standard libarary sorting facilities. They just aren't designed your particular problem, unfortunately.
Upvotes: 0
Reputation: 754820
The canonical way to sort an array of character strings in C, and therefore an available but not necessarily recommended way to do so in C++, uses a level of indirection to strcmp()
:
static int qsort_strcmp(const void *v1, const void *v2)
{
const char *s1 = *(char * const *)v1;
const char *s2 = *(char * const *)v2;
return(strcmp(s1, s2));
}
static void somefunc(void) // Or omit the parameter altogether in C++
{
char **array = ...assignment...
size_t num_in_array = ...number of char pointers in array...
...
qsort(array, num_in_array, sizeof(char *), qsort_strcmp);
...more code...
}
Upvotes: 0
Reputation: 30235
template<size_t length> int less(const char* left, const char* right) {
return memcmp(left, right, length) < 0;
}
std::sort(array, array + array_length, less<buffer_length>);
Upvotes: 15
Reputation: 507243
boost::bind
can do it:
// ascending
std::sort(c, c + size, boost::bind(std::strcmp, _1, _2) < 0);
// descending
std::sort(c, c + size, boost::bind(std::strcmp, _1, _2) > 0);
Edit: The strings are not null-terminated:
// ascending
std::sort(c, c + array_size, boost::bind(std::memcmp, _1, _2, size) < 0);
// descending
std::sort(c, c + array_size, boost::bind(std::memcmp, _1, _2, size) > 0);
Upvotes: 3
Reputation: 328770
Use the GNU sort program (externally) if you can't fit the data into RAM: it will sort arbitrary sized files and the larger the file, the smaller the additional cost of creating the process.
Upvotes: 6
Reputation: 101494
You can use the algorithms in the STL on arrays native datatypes, not just on STL containers. The other suggestion to use std::sort won't work as posted however, because strcmp returns a value that evaluates to true for all comparisons when the strings aren't the same, not just if the left hand side is less than the right hand side -- which is what std::sort wants; a binary predicate returning true of the left hand side is less than the right hand side.
This works:
struct string_lt : public std::binary_function<bool, char, char>
{
bool operator()(const char* lhs, const char* rhs)
{
int ret = strcmp(lhs, rhs);
return ret < 0;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
char* strings [] = {"Hello", "World", "Alpha", "Beta", "Omega"};
size_t numStrings = sizeof(strings)/sizeof(strings[0]);
std::sort(&strings[0], &strings[numStrings], string_lt());
return 0;
}
Upvotes: 5
Reputation: 8209
If the files are large and do not fit in RAM, you can use bin/bucket sort to split the data into smaller files and finally aggregate the pieces in a result file. Other responses show you how to sort each individual bucket file.
Upvotes: 2
Reputation: 447
Probably the easiest way is to used the old stdlib.h function qsort. This should work:
qsort( array, num_elements, sizeof( char* ), strcmp )
Please note that this is standard C and only works reliable with English text.
If you have a list of String objects, then other things are possible in C++.
If you are on Linux and writing a gtk or Qt application then I would propose that you have a look at these libraries beforehand.
Upvotes: 2