Pastx
Pastx

Reputation: 767

Duplicate removal sorting string algorithm in C

Hello I wonder if there is any sorting algorithm similar to qsort that would also remove duplicate indexes.

Here is problem.

This is input:

array[0]->["astr"]
array[1]->["zstr"]
array[2]->["cstr"]
array[3]->["astr"]
array[4]->["zstr"]

Here's my sorting algorithm:

int compare(const void *u, const void *h) {
    const char **iu = (const char **) u;
    const char **ih = (const char **) h;
    return strcasecmp(*iu, *ih);
}

...
qsort(array, n, sizeof (char *), compare);

Output is :

array[0]->["astr"]
array[1]->["astr"]
array[2]->["cstr"]
array[3]->["zstr"]
array[4]->["zstr"]

What i need is :

array[0]->["astr"]
array[1]->["cstr"]
array[2]->["zstr"]

I could try to read whole array in for cycle and test every index if it is same as the next one and then realloc new array where i would store only unique words but this is very slow so i need to find sorting algorithm that would do this for me faster than i can.

Upvotes: 2

Views: 851

Answers (3)

chill
chill

Reputation: 16888

You can just remove duplicates after the sort. This has the advantage of not needing additional storage and is O(n), so the total complexity is still the O(nlogn) of the sort.

int unique (int n, const char **a) {
   int dst = 0, i;
   for (i = 1; i < n; ++i) {
       if (strcmp (a[dst], a[i]) != 0)
           a[++dst] = a[i];
   }

   return dst + 1;
}

Upvotes: 3

kajacx
kajacx

Reputation: 12939

You can always take sorted duplicite array and create new sorted non-duplicite by simple copiing, just call realloc at the end to get rid of unused indexes.

Upvotes: 1

Bathsheba
Bathsheba

Reputation: 234715

You probably can't adapt qsort too easily since deleting array elements could invalidate pivot points elsewhere in the recursion.

I'd use an insertion sort algorithm: that would be trivial to adapt; you might even get deletion of duplicates for free.

Upvotes: 3

Related Questions