Binary Search Trees

A tree has the following properties:

  • One root node
  • Every node is connected by one edge with another node, except the root node
  • A unique path traverses from the root node to each node

A Binary Tree has the following additional property:

  • Every node has a maximum of two children.

Recursive Definition of a tree: A tree is either empty or has a root and zero or more subtrees, each of which is connected to the root of the parent tree by an edge.

Implementation

Let's implement a tree in Python in a Object-Oriented fashion.

Let's say you saved above file as binarytree.py, now if you open a Python session in bash or sh you can try out the following.

>>> from binarytree import BinaryTree
>>> r = BinaryTree('a')
>>> r.getRootVal()
'a'
>>> r.insertLeft('b')
>>> r.getLeftChild().getRootVal()
'b'

Traversals

We implement the pre-order, post-order and inorder traversals for our tree data structure as external functions:

Let us see how this works for a concrete example binary tree.

       30
   25     42
12 26  3 55

Pre-Order Traversal

NLR

In pre-order traversal we print the current node before recursing in all left nodes, printing each and if none are left into the right nodes, printing them and unwinding.

For our little binary tree above, the pre-order algorithm will output:

30, 25, 12, 26, 42, 3, 55

Post-Order Traversal

LRN

The emphasis here is on checking and recursing into the left and right node before we fall through to the print node statement and unwind to the root node, which itself will fall through to the print node statement if the left and right nodes have been recursed already.

For our binary tree above the postorder function will output:

12, 26, 26, 3, 55, 42, 30

You should see now why the root node gets printed out last.

Inorder Traversal

LNR

With inorder traversal the emphasis is on checking all left nodes before printing. Then checking for any right nodes, unwinding and so forth.

12, 25, 26, 30, 3, 42, 55

In case my writing was not clear enough, there is a great visualisation on YouTube of these traversals: https://www.youtube.com/watch?v=S5emQOEMiqc

Complete Binary Tree

A complete binary tree is a data structure where every node has two childs exception can only be made at the bottom level of the tree. By following these rules we can create a balanaced tree, a tree which has on each sides of a parent never more than 1 level of difference, e.g.

     3
 12  11
         17
      19

Above tree is not balanced and it is also not a complete binary tree. Yet above data structure does qualify as ordinary binary tree with a maximum of 2 nodes per subtree.

The balance factor of above root node is -2. That can be calculated by using the following formula:

balanceFactor=height(leftSubTree)−height(rightSubTree)

For above binary tree the factors are as follows:

     -2
 0     -2
           1
        0

Min Heap

      3
  12   11
17 14

Heap Order

In a min heap the order is such as that the parent node is always smaller than any child node. In a max heap this is vice versa. To maintain this order a heap is being heapyfied.

Heap Structure

The heap structure property of a binary heap in this case a complete binary tree, is as mentioned before that every node has two childs but the bottom level, where new items get added to and being removed from the left.

Adding Elements

Now Adding elements into our min heap works by adding a new element to the bottom level on the right. This way heap structure is maintained.

But what about heap order? To maintain this, we have to swap items, as long as the parent is larger as our new item, if it is smaller than our new item we stop swapping with parents.

Since this is a heap it is very easy to calculate the position of a parent node in a list representation of our complete binary tree. If the root node has the index 0 in our list, the parent of a left node is

ParentL=(index(node)-1)/2

The parent of a right node is

ParentR=(index(node)-2)/2

This is because the binary tree is complete, or a heap.

Removing Elements

This is slightly more difficult, but it will return the elements in the tree in a sorted order.

1) Remove Payload of Root Node
2) Make payload of most right bottom node the new root node payload
3) Swap root payload with smallest child until there is no direct child left that is smaller

Represanation as list

As being a heap our complete tree can be represented in a list:

      3
  12   11
17 14

Fig.1: Complete Binary Tree

3, 12, 11, 17, 14
Fig.2: List Representation

If we now, for instance, we remove an item from the min heap, this will be the root element, 3. The new payload of the root node will be 14, which needs to be swapped with 11, making 11 the new root.

      11
  12   14
17

Fig.3: Complete Binary Tree after first removal

11, 12, 14, 17
Fig.4: List Representation after first removal

3
Fig.5: Resulting sorted list after first removal

In-place Heap Sort Algorithm

Additional memory for the new list will be used, therefore an in-place heap is the least we can want. Our in-place heap sort will use a max heap and swap the largest item to the end of the list making this last element a partition on its own. The old list need to be heapified in the next step to maintain the order property.

heap-inplace1 Fig. 6 - First pass in an inplace Max Heap

heap-inplace1 Fig. 7 - Second pass in an inplace Max Heap

Complexity

The binary strucutre of a heap means that there will be a maximum of 2h+1-1 elements in a heap of height h.

n is of the form:

$$ \mathcal n = 2^{h+1}-1 $$

      3      (h=0)
  12   11 (h=1)
17 14     (h=2)

22+1-1 = 7, and indeed above binary tree has less than 7 elements.

The height of a binary tree can be calculated with log2 n, which is the inverse relation of the above. The maximum number of comparisons necessary to heapify a binary tree after payload removal is log2 n. We do this n-1 times to completely remove all items, effictively sorting the heap, so the worst case complexity of heap sort is

$$ \mathcal O(n \log n) $$

Building the heap is O(n), but this does not dominate. Once again, this is the worst case scenario, where an element needs to sift through all levels of the tree down to the bottom level. Since with quick sort the worst case complexity is O(n2), heao sort is used in applications where caclulations must be completed in guaranteed time, like in mission-critical space missions, medical applications like life-supporting devices. In practice quick sort is faster (and most commonly used), but average complexity of O(n log n) of quick sort is not guaranteed.

AVL Trees

AVL trees are not complete as a heap, so less than two childs is allowed at any level of the tree.

AVL Trees are binary trees that are kept in a balanced state. This means if you add or remove nodes from it, you have to perform a left or right rotation in order to keep it or any suibtrees from becoming right or left heavy.

In the following you will see a gree where we delete the node 4 and perform a left rotation in order to keep the AVL tree balanced.

avltree1

Fig 8: The AVL tree in its original state

avltree1

Fig 9: The AVL tree after removing node 4

avltree1

Fig 10: The AVL tree after left rotation around the root

Deleting and removing from an AVL tree has the complexity of O(log n).

References

University of Maryland (1998) Heapsort - Analysis and Partitioning [online]. Available at http://www.cs.umd.edu/~meesh/351/mount/lectures/lect14-heapsort-analysis-part.pdf (Accessed 1 June 2014).

The University of Auckland (1998) Data Structures and Algorithms - Comparing Quick and Heap Sorts [online]. Available at https://www.cs.auckland.ac.nz/software/AlgAnim/qsort3.html (Accessed 1 June 2014).

Piwek, P., Walker R., Willis A. (2013) Algorithms, data structures and computability: Sorting, Milton Keynes, The Open University.

Miller, B., Ranum, D. (2011) Problem Solving with Algorithms and Data Strucutres using Python, Sherwood, Franklin, Beedle & Associates.