Reputation: 195
I'm using the qsort()
that comes with the stdlib.h
library to sort an array of structures of strings.
It's essentially an array of strings but with a structure that is containing the array.
For example:
typedef struct node {
char name[MAX_SIZE + 1];
} Node;
Then my array of nodes that contains the names would be:
Node nodes_list[MAX_SIZE + 1];
My question is, I want to sort nodes_list
so when I print the following:
for (i = 0; i < size; i++) {
printf("%s\n", nodes_list[i].name);
}
it prints all the names in alphabetical order.
I would like to do sort the list using qsort
and my comparator function is this:
int compare(const void *a, const void *b) {
const char **ia = (const char **)a;
const char **ib = (const char **)b;
return strcmp(*ia, *ib);
}
when I run the function with qsort
:
qsort(nodes_list, size, sizeof(Node), compare);
I get a segmentation fault (core dumped).
I know I am getting a segmentation fault with this snippet of code because without it, I can print the list of names fine. Not sorted of course.
Can someone help?
Upvotes: 0
Views: 3311
Reputation:
Your comparison function is wrong for your array format.
Here's a simple checklist you can follow to get the types and sizes right when using qsort:
sizeof *x
where x
is the first argument.void *
aren't necessary.const
, but if you do, it's because you've put the const
in the wrong place. To assign a const void *
successfully without a cast, the destination type should have exactly one *
after the const
keyword. const char *
and char const *
are OK (and equivalent to each other); const char *const *
is also OK (and different); const char **
is wrong. And if you can't put a const
before the *
because you don't have a *
because you typedef'ed the pointer type, this is why you shouldn't do that.const
, the type of the pointers declared at the beginning of the comparison function should be exactly the same as the type of the first argument to qsort, after applying the "array decays to pointer" rule if the first argument to qsort is the name of an array.In your case, the first argument to qsort is nodes_List
which is an array of Node
, so apply the decay-to-pointer rule and you get a Node *
, then add a const
and you get:
const Node *a_node = a;
const Node *b_node = b;
Now you have a nice pair of properly typed pointers, and you simply compare them in the obvious way:
return strcmp(a_node->name, b_node->name);
To explain why rule #4 works, you have to look closely at the memory layout. Supposing that MAX_SIZE is 15, so that MAX_SIZE+1 is a nice round 16, your Node
type contains a 16-byte array of char, and your nodes_list
contains 16 of those for a total of 16*16=256 bytes. Suppose that nodes_list is located at memory address 0x1000. Then the layout is:
+---------------+---------------+ +---------------+
| nodes_list[0] | nodes_list[1] |...............| nodes_list[15]|
+---------------+---------------+ +---------------+
^ ^ ^ ^
0x1000 0x1010 0x10f0 0x1100
Addresses 0x1000 through 0x10ff are actually part of the object. 0x1100 is the trailing edge - one byte past the end.
Suppose further that the array is half-full (size
is 8), and it is populated with these 8 strings:
Hotel Foxtrot Echo Charlie Golf Delta Bravo Alpha
and that the unused portions are filled with 0's. The object is made up of these 256 bytes (I've added spaces and line breaks for illustration purposes)
H o t e l \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
F o x t r o t \0 \0 \0 \0 \0 \0 \0 \0 \0
E c h o \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
C h a r l i e \0 \0 \0 \0 \0 \0 \0 \0 \0
G o l f \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
D e l t a \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
B r a v o \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
A l p h a \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
... 128 more \0's
Now, you pass qsort the starting address of this block of memory (first arg, nodes_list
, 0x1000) plus 2 pieces of information about its internal structure: the number of elements (2nd arg, size
, 8) and the number of elements (3rd arg, sizeof Node
, 16). With that information it knows that the elements of the array are at addresses 0x1000, 0x1010, 0x1020, ... 0x1070. It picks a pair of them - which pair it chooses depends on what sorting algorithm it uses - let's say for simplicity it is a stupid bubble sort which starts by comparing the first two elements.
qsort calls your comparison function with the addresses of the elements, 0x1000 and 0x1010. It doesn't know their types, but it knows their sizes. Each one is an array element occupying 16 bytes.
Your comparison function receives a=0x1000
and b=0x1010
. They are pointers to 16-byte objects - specifically, they each point to a struct Node
. If you do the wrong thing, and cast them to char **
, what happens? Well, you get a char **
with value 0x1000, and you have to dereference that char **
to get a char *
to pass to strcmp
, so you do that dereference, and end up loading the bytes 'H', 'o', 't', 'e'
as a pointer value (assuming your pointers are 4 bytes long). On a big-endian machine with ASCII as the charset, this is a pointer to memory address 0x486f7465, which you pass to strcmp
. strcmp
crashes. The result of trying struct Node **
is basically the same.
Another good thing to know is how qsort uses the member size information in its reordering of the array. The 3rd arg is not just the size of an object that the comparison acts on, it's also the size of the object that gets moved as a unit when reordering the array. After your comparison function returns 1 (strcmp("Hotel", "Foxtrot")), our hypothetical bubble sort implementation of qsort will swap the objects at 0x1000 and 0x1010 to put them in the correct order. It will do this with a series of 3 memcpy's of 16 bytes each. It has to move all those extra \0
's around because it doesn't know that they are useless. Those 16-byte objects are opaque to qsort. This might be a reason to consider building a secondary array of pointers and qsorting it instead of the main array, when your main array has objects that are very large.
Upvotes: 1