Reputation: 2172

Why a lot of binary-tree data structures in C do not have a parent node pointer?

I'm new to C programming, and I'm learning C algorithms with C.

Here is my problem about how to define the binary tree node data structure.

Use or NOT use a parent node pointer

Here are 2 typical sample code for defining a Node data structure.

Without parent node pointer

typedef struct binaryTreeNode_{
  int key;
  void *data;
  binaryTreeNode_ *leftNode;
  binaryTreeNode_ *rightNode;
} binaryTreeNode;

With parent node pointer

typedef struct binaryTreeNode_{
  int key;
  void *data;
  binaryTreeNode_ *leftNode;
  binaryTreeNode_ *rightNode;
  binaryTreeNode_ *parentNode;
} binaryTreeNode;

My question

Obviously, using a node structure with a parent node pointer will make a lot of work much more easier. Like traverse a node/a tree, DFS/BFS with binary tree. So my question is why there are some solutions that are based on a structure without parent node?.

Are there any historical reasons? If simply because the limitation of RAM/DISK capacity, I think we can drop the solution that does not have a parent node, can't we?

Maybe not relavent

Just like Linked List and Doubly Linked List, should we use Doubly Linked List to implement Stack and Queue?

Upvotes: 7

Answers (7)

kofemann

Reputation: 4423

The binary tree does not define operations which requires a reference to the parent node. You always traverse from top to down.

Upvotes: 1

Gaslight Deceive Subvert

Reputation: 20438

Most balanced trees such as red-black ones and AVL trees do indeed have parent pointers embedded in their node structure. And they are binary search trees with some extra properties. The reason is because it makes performing the rebalancing operations necessary much simpler.

There are also some scenarios which are very cumbersome without parent pointers.

Consider a binary search tree and you pick a victim node at random from it. How do you remove it from the tree? Without a parent pointer you don't know where in the tree it sits so you have to perform a search to find its parent so you know what edge to update. With a parent pointer, you would instead do it roughly like this:

parent = victim->parent;
if (parent) {
    if (parent->left == victim) {
        parent->left = NULL;
    } else {
        parent->right = NULL;
    }
}
... free(victim);

Another operation that is tricky without parent pointers is finding a nodes successor. Especially if the tree allows multiple objects with the same key value to be stored, which is needed if you are implementing a multiset. But with them, it is easy:

bstree *successor(bstree *node) {
    // If node has a right child, the successor is the minimum 
    // node of that subtree.
    if (node->right) {
        return minimum(node->right);
    }
    // If node hasn't, we look upwards.
    bstree *x = node->parent;
    while (x && node == x->right) {
        node = x;
        x = node->parent;
    }
    return x;
}

Parent pointers take away from the beauty of the data structure. They make it so that you can't treat a tree as a composition of trees anymore which is a shame. But in most practical implementations of tree structures I've seen they are there because they are very useful.

Upvotes: 2

Adam Mihalcin

Reputation: 14478

There are very good reasons to use a tree without parent pointers, and memory usage isn't the issue.

In the functional programming world (think Lisp, Scheme, Standard ML, OCaml, Haskell, F#, and so on), trees and linked lists are very common data structures, so even in C a lot of thinking about trees and lists is influenced by FP. Trees and lists are recursively defined (a tree is either a leaf or an interior node whose children are trees, and a list is either nil or a node with a data element and a list attached), and they are almost always implemented as immutable datastructures. Immutability is helpful because it makes parallelism cleaner (no programmer-visible locks), sharing of data between functions nicer (don't have to copy data), and proofs easier.

The problem with a parent pointer or a doubly linked list is that suddenly immutability goes out the window. With an immutable object, you have to specify everything about the object at time of creation. So, if you want to maintain immutability, you can't create a node until its children have been created (because those children have to be specified at the time of creation), but you also can't set parent pointers on the children before the parent has been created (because the parent doesn't exist). In other words, immutability does not play well with circular dependencies. Similarly, you can't create a doubly linked list without some mutation, since without mutation you can't create the first node until the second node has been created, and you can't set the previous pointer on the second node until the first node has been created.

The FP folks manage to write a lot of code with strictly immutable datastructures, and prove lots of useful properties to boot. Certainly, maintaining parent pointers makes the programmer's life more difficult, because as soon as you change one node in the tree you have to make changes to all its children's parent pointers, which is a pain.

Because so much thinking on lists and trees has been influenced by FP, which doesn't include parent pointers or doubly linked lists, and because maintaining parent pointers is a tricky business likely to introduce bugs, many C tree implementations don't use parent pointers.

Also, one other note: you ask about using linked lists versus doubly linked lists for stacks and queues. There is no need to use a doubly linked list to implement a stack, because you don't need efficiency in traversing any element but the first (and the second, if the stack is mutable). There is a cute trick to implement a queue with two stacks, which provides amortized constant time queue operations. If you're not using that, though, a queue is also a decent use case for either a doubly linked list or an array.

Upvotes: 12

NPE

Reputation: 500923

For many algorithms, parent pointers are superfluous. They do, however, incur memory expense and additional algorithmic complexity.

Just like Linked List and Doubly Linked List...

The difference is that the latter can perform more operations with O(1) time complexity. The cost is the additional memory overhead. In other words, there's a time-memory tradeoff.

should we use Doubly Linked List to implement Stack and Queue?

Both can be implemented using a doubly-linked list. However, for one of the two structures (I leave it to you to figure out whether that's the stack or the queue!), the extra overhead of a doubly-linked list buys absolutely nothing.

Upvotes: 4

dirkgently

Reputation: 111316

A tree rarely requires a parent pointer for the regular/simple operations. It is only when you are doing something exotic (say retracing the path back from a leaf to a node) that a parent pointer maybe required.

Are there any historical reasons? If simply because the limitation of RAM/DISK capacity, I think we can drop the solution that does not have a parent node, can't we?

Some historical reasons too. Also, memory constraints would require you to use the smallest, most compact structures. Even to this day, there are embedded systems with stringent memory requirements. Also, you wouldn't really want to mess with existing code that works, so these things stick.

So my question is why there are some solutions that are based on a structure without parent node?

Possibly because their applications did not require accessing the parent that often. Note, that this is a typical space-time tradeoff.

Just like Linked List and Doubly Linked List, should we use Doubly Linked List to implement Stack and Queue?

That depends on your application and the gurantees your data structures need to provide per operation. Also, you may be interested in looking up the XOR linked list!

Upvotes: 9

Basile Starynkevitch

Reputation: 1

Some balanced trees implementation do maintain a pointer to parent. Apparently Glib Balanced Trees have nodes containing such a pointer.

But, when you have an internal recursive function traversing a tree, you can code it so that it passes both the tree and its immediate parent during recursive calls.

Upvotes: 0

Mörre

Reputation: 5723

You need a parent node only if you have code that jumps right "into the tree". For example, in web browsers you always get a DOM element in event handlers and have to traverse the DOM tree upwards from there, to find some parent HTML element.

However, in many data structures you usually start at the root and go down, either depth first or breadth-first - does not matter, you KNOW where you started from, so no need to have parent references.

http://en.wikipedia.org/wiki/Depth-first_search

http://en.wikipedia.org/wiki/Breadth-first_search

Basically, your algorithm determines your data structure. Both go together, it's not useful to consider one without the other.