Reputation: 16974
I'm scanning a large data source, currently about 8 million entries, extracting on string per entry, which I want in alphabetical order.
Currenlty I put them in an array then sort an index to them using qsort()
which works fine.
But out of curiosity I'm thinking of instead inserting each string into a data structure that maintains them in alphabetical order as I scan them from the data source, partly for the experience of emlplementing one, partly because it will feel faster without the wait for the sort to complete after the scan has completed (-:
What data structure would be the most straightforward to implement in C?
UPDATE
To clarify, the only operations I need to perform are inserting an item and dumping the index when it's done, by which I mean for each item in the original order dump an integer representing the order it is in after sorting.
SUMMARY
Upvotes: 1
Views: 1475
Reputation: 2264
You are already using the optimal approach. Sort at the end will be much cheaper than maintaining an online sorted data structure. You can get the same O(logN) with a rb-tree but the constant will be much worse, not to mention significant space overhead.
That said, AVL trees and rb-trees are much simpler to implement if you don't need to support deletion. Left-leaning rb tree can fit in 50 or so lines of code. See http://www.cs.princeton.edu/~rs/talks/LLRB/ (by Sedgewick)
Upvotes: 2
Reputation: 2967
you should take a look at Trie datastructure wikilink i think this will serve what you want
Upvotes: 0
Reputation: 176
You could implement a faster sorting algorithm such us Timsort or other sorting algorithms with a nlog(n) worst case and just search it using Binary search since its faster if the list is sorted.
Upvotes: 0
Reputation: 363577
Binary search trees. Or self-balancing search trees. But don't expect those to be faster than a properly implemented dynamic array, since arrays have much better locality of reference than pointer structures. Also, unbalanced BSTs may "go linear", so your entire algorithm becomes O(n²), just like quicksort.
Upvotes: 3