sud03r
sud03r

Reputation: 19759

Sorting a binary search tree on different key value

Say I have a binary tree with the following definition for a node.


struct node
{
 int key1 ;
 int key2 ;
}

The binary search tree is created on the basis of key1. Now is it possible to rearrange the binary search tree on basis of key2 in O(1) space. Although I can do this in variable space using an array of pointers to nodes.

The actual problem where I require this is "counting number of occurrences of unique words in a file and displaying the result in decreasing order of frequency." Here, a BST node is


{
 char *word;
 int freq ;
}
The BST is first created on basis of alphabetic order of words and finally I want it on basis of freq.

Am I wrong at choice of data structure i.e a BST?

Upvotes: 3

Views: 5367

Answers (6)

ilya n.
ilya n.

Reputation: 18816

I think you can create a new tree sorted by freq and push there all elements popping them from an old tree.

That could be O(1) though likely more like O(log N) which isn't big anyway.

Also, I don't know how you call it in C#, but in Python you can use list but sort it by two different keys in-place.

Upvotes: 1

Mecki
Mecki

Reputation: 132919

You can easily do this in O(1) space, but not in O(1) time ;-)

Even though re-arranging a whole tree recursively until it is sorted again seems possible, it is probably not very fast - it may be O(n) at best, probably worse in practice. So you might get a better result by adding all nodes to an array once you are done with the tree and just sorting this array using quicksort on frequency (which will be O(log n) on average). At least that's what I would do. Even tough it takes extra space it sounds more promising to me than re-arranging the tree in place.

Upvotes: 1

agorenst
agorenst

Reputation: 2425

Here is my suggestion for re-balancing the tree based off of the new keys (well, I have 2 suggestions).

The first and more direct one is to somehow adapt Heapsort's "bubble-up" function (to use Sedgewick's name for it). Here is a link to wikipedia, there they call it "sift-up". It is not designed for an entirely-unbalanced tree (which is what you'd need), but I believe it demonstrates the basic flow of an in-place reordering of a tree. It may be a bit hard to follow because the tree is in fact stored in array rather than a tree (though the logic in a sense treats it as a tree) --- perhaps, though, you'll find such an array-based representation is best! Who knows.

The more crazy-out-there suggestion of mine is to use a splay tree. I think they're nifty, and here's the wiki link. Basically, whichever element you access is "bubbled up" to the top, but it maintains the BST invariants. So you maintain the original Key1 for building the initial tree, but hopefully most of the "higher-frequency" values will also be near the top. This may not be enough (as all it will mean is that higher-frequency words will be "near" the top of the tree, not necessarily ordered in any fashion), but if you do happen to have or find or make a tree-balancing algorithm, it may run a lot faster on such a splay tree.

Hope this helps! And thank you for an interesting riddle, this sounds like a good Haskell project to me..... :)

Upvotes: 1

Phil
Phil

Reputation: 183

Using a HashTable (Java) or Dictionary (.NET) or equivalent data structure in your language of choice (hash_set or hash_map in STL) will give you O(1) inserts during the counting phase, unlike the binary search tree which would be somewhere from O(log n) to O(n) on insert depending on whether it balances itself. If performance is really that important just make sure you try to initialize your HashTable to a large enough size that it won't need to resize itself dynamically, which can be expensive.

As for listing by frequency, I can't immediately think of a tricky way to do that without involving a sort, which would be O(n log n).

Upvotes: 1

yves Baumes
yves Baumes

Reputation: 9026

Map, BST are good if you need to have sorted output for your dictionnary.

And it is good if you need to mix up add, remove and lookup operations. I don't think this is your need here. You load the dictionnary, sort it, then do only look up in it, that's right ? In this case a sorted array is probably a better container. (See Item 23 from Effective STL from Scott Meyer).
(Update: simply consider that a map could generate more memory cache misses than a sorted array, as an array get its data contiguous in memory, and as each node in a map contain 2 pointers to other nodes in the map. When your objects are simple and take not much space in memory, a sorted vector is probable a better option. I warmly recommand you to read that item from Meyer's book)

About the kind of sort you are talking about, you will need that algorithm from the stl: stable_sort. The idea is to sort the dictionnary, then sort with stable_sort() on the frequence key.

It will give something like that (not tested actually, but you got the idea):

struct Node
{
char * word;
int key;
};

bool operator < (const Node& l, const Node& r)
{
    return std::string(l.word) < std::string(r.word));
}

bool freq_comp(const Node& l, const Node& r)
{
    return l.key < r.key;
}

std::vector<node> my_vector;
... // loading elements
sort(vector.begin(), vector.end());
stable_sort(vector.begin(), vector.end(), freq_comp);

Upvotes: 1

Thompsonian
Thompsonian

Reputation: 407

One approach you could consider is to build two trees. One indexed by word, one indexed by freq.

As long as the tree nodes contain a pointer to the data node, you could access if via the word-based tree to update the info, but later access it by the freq-based tree to output.

Although, if speed is really that important, I'd be looking to get rid of the string as a key. String comparisons are notoriously slow.

If speed is not important, I think your best bet is to gather the data based on word and re-sort based on freq as yves has suggested.

Upvotes: 0

Related Questions