Tree traversal in C++ and Haskell

Question

I am new to Haskell. I am trying to get a sense how well haskell can handle the recursive function call together with their lazy evaluation. The experiement I made is simply building binary search tree in both C++ and Haskell and traverse them respectively in postorder. C++ implementation is standard one with auxilary stack. (I just print the element out once I visit it).

Here is my haskell code:

module Main (main) where

import System.Environment (getArgs)
import System.IO
import System.Exit
import Control.Monad(when)
import qualified Data.ByteString as S

main = do
     args <- getArgs
     when (length args < 1) $ do
          putStrLn "Missing input files"
          exitFailure

     content <- readFile (args !! 0)
     --preorderV print $ buildTree content
     mapM_ print $ traverse POST $ buildTree content
     putStrLn "end"


data BSTree a = EmptyTree | Node a (BSTree a) (BSTree a) deriving (Show)
data Mode = IN | POST | PRE

singleNode :: a -> BSTree a
singleNode x = Node x EmptyTree EmptyTree

bstInsert :: (Ord a) => a -> BSTree a -> BSTree a
bstInsert x EmptyTree = singleNode x
bstInsert x (Node a left right)
          | x == a = Node a left right
          | x < a  = Node a (bstInsert x left) right
          | x > a  = Node a left (bstInsert x right)

buildTree :: String -> BSTree String
buildTree = foldr bstInsert EmptyTree . words

preorder :: BSTree a -> [a]
preorder EmptyTree = []
preorder (Node x left right) = [x] ++ preorder left ++ preorder right

inorder :: BSTree a -> [a]
inorder EmptyTree = []
inorder (Node x left right) = inorder left ++ [x] ++ inorder right

postorder :: BSTree a -> [a]
postorder EmptyTree = []
postorder (Node x left right) = postorder left ++  postorder right ++[x]

traverse :: Mode -> BSTree a -> [a]
traverse x tree = case x of IN   -> inorder tree
                            POST -> postorder tree
                            PRE  -> preorder tree


preorderV :: (a->IO ()) -> BSTree a -> IO ()
preorderV f EmptyTree = return ()
preorderV f (Node x left right) = do 
                                     f x
                                     preorderV f left
                                     preorderV f right

My test result show that C++ significantly outperform Haskell:

C++ performance: (note that first15000.txt is roughly 5 times of first3000.txt)

time ./speedTestForTraversal first3000.txt > /dev/null 

real    0m0.158s
user    0m0.156s
sys     0m0.000s
time ./speedTestForTraversal first15000.txt > /dev/null 

real    0m0.923s
user    0m0.916s
sys     0m0.004s

Haskell with same input file:

time ./speedTestTreeTraversal first3000.txt > /dev/null 

real    0m0.500s
user    0m0.488s
sys     0m0.008s
time ./speedTestTreeTraversal first15000.txt > /dev/null 

real    0m3.511s
user    0m3.436s
sys     0m0.072s

What I expected that haskell should be not too far away from C++. Did I make some mistake? Is there any way to improve my haskell code?

Thanks

Edit: Oct. 18 2014

After testing serval occasions, haskell's traversal is still significantly slower than C++ implementation. I would like to give Cirdec's answer a full credit, since he points out inefficiency of my haskell implementation. However, my original question is to compare C++ and haskell implementation. So I would like to keep this question open and post my C++ code to encourage further discussion.

#include 
#include 
#include 
#include 
#include 
using namespace std;
using boost::algorithm::trim;
using boost::algorithm::split;


template
class Node
{
public:
    Node(): val(0), l(NULL), r(NULL), p(NULL) {};
    Node(const T &v): val(v), l(NULL), r(NULL), p(NULL) {}
    Node* getLeft() {return l;}
    Node* getRight(){return r;}
    Node* getParent() {return p;}
    void  setLeft(Node *n) {l = n;}
    void  setRight(Node *n) {r = n;}
    void  setParent(Node *n) {p = n;}
    T  &getVal() {return val;}
    Node* getSucc() {return NULL;}
    Node* getPred() {return NULL;}
private:
    T val;
    Node *l;
    Node *r;
    Node *p;
};

template
void destoryOne(Node* n)
{
    delete n;
    n = NULL;
}

template
void printOne(Node* n)
{
    if (n!=NULL)
    std::cout << n->getVal() << std::endl;
}




template
class BinarySearchTree
{
public:
    typedef void (*Visit)(Node *);

    BinarySearchTree(): root(NULL) {}
    void delNode(const T &val){};
    void insertNode(const T &val){
    if (root==NULL)
        root = new Node(val);
    else {
        Node *ptr = root;
        Node *ancester = NULL;
        while(ptr && ptr->getVal()!=val) {
        ancester = ptr;
        ptr = (val < ptr->getVal()) ? ptr->getLeft() : ptr->getRight(); 
        }
        if (ptr==NULL) {
        Node *n = new Node(val);
        if (val < ancester->getVal())
            ancester->setLeft(n);
        else
            ancester->setRight(n);
        } // else the node exists already so ignore!
    }
    }
    ~BinarySearchTree() {
    destoryTree(root);
    }
    void destoryTree(Node* rootN) {
    iterativePostorder(&destoryOne);
    }

    void iterativePostorder(Visit fn) {
    std::stack* > internalStack;
    Node *p = root;
    Node *q = root;
    while(p) {
        while (p->getLeft()) {
        internalStack.push(p);
        p = p->getLeft();
        }
        while (p && (p->getRight()==NULL || p->getRight()==q)) {
        fn(p);
        q = p;
        if (internalStack.empty())
            return;
        else {
            p = internalStack.top();
            internalStack.pop();
        }
        }
        internalStack.push(p);
        p = p->getRight();
    }
    }


    Node * getRoot(){ return root;}
private:
    Node *root;
};



int main(int argc, char *argv[])
{
    BinarySearchTree bst;
    if (argc<2) {
    cout << "Missing input file" << endl;
    return 0;
    }
    ifstream inputFile(argv[1]);
    if (inputFile.fail()) {
    cout << "Fail to open file " << argv[1] << endl;
    return 0;
    }
    while (!inputFile.eof()) {
    string word;
    inputFile >> word;
    trim(word);
    if (!word.empty()) {
        bst.insertNode(word);
    }
    }

    bst.iterativePostorder(&printOne);

    return 0;
}

Edit: Oct 20 2014 Chris's answer in below is very thorough and I can repeat the result.

CR Drost · Accepted Answer

I generated a file containing all of the 4-letter strings on the lowercase ASCII alphabet abcdefghijklmnopqrstuvwxyz separated by spaces; and I think I got it in the right order so that the tree your code generates is perfectly balanced.

I chose this length because it takes 3.4 seconds on my computer, much like your 3.5s Haskell run. I called it 26_4.txt for obvious reasons. It sounds like actually your data set is close to 26⁴ words, so it's comparable in length too.

A lower bound on runtime would be something like:

import System.IO
main = do
    mylist <- readFile "26_4.txt"
    mapM_ putStrLn (words mylist)

and this for my data set reduces to taking 0.4s (piping stdout to /dev/null of course). So we can't expect more than, say, a factor of 10 speedup on this sort of problem from Haskell, it looks like. However, that factor is well within the bounds for your problem; C++ takes twice as long as this super-simple program.

But, no processing is an unrealistic goal. We can get a bound which is more realistic if we use a data structure which was already optimized for us by professionals who understand Haskell better:

import System.IO
import qualified Data.Map.Strict as Map

balancedTree = Map.fromList . map (\k -> (k, ()))

serializeTree = map fst . Map.toList

main = do
    mylist <- readFile "26_4.txt"
    mapM_ putStrLn (serializeTree $ balancedTree $ words mylist)

This runs for something more like 1.6s on my machine. It's not quite as fast as your C++ but your C++ does not balance the tree as far as I can tell.

I made Cirdec's modification to the code and it took your code down to 3.1s, so it only shaved about 10% of that runtime off of the file.

However, on my machine, this file doesn't even run unless you give it more memory with RTSopts. And that points to a really important optimization: tail-call optimization. The code from both you and Cirdec is suboptimal in a special way: it is not tail-recursive, meaning that it cannot be turned into a loop by GHC. We can make it tail-recursive by writing an explicit stack of 'stuff to be done' which we descend into:

postorder :: BSTree a -> [a]
postorder t = go [] [t]
    where go xs [] = xs
          go xs (EmptyTree : ts) = go xs ts
          go xs (Node x a b : ts) = go (x : xs) (b : a : ts)

This change seems to bring it all the way down to 2.1s.

Another difference between C++ and Haskell which causes some amount of time wasted is that the Haskell version will allow you to lazily construct your search tree, whereas your C++ code wouldn't allow this. We can make the Haskell code strict to deal with this, providing something like:

data BSTree a = EmptyTree
          | Node  !a !(BSTree a) !(BSTree a) deriving (Show)

This change combined with Cirdec's brings us down to 1.1 seconds, which means that we're on-par with your C++ code, at least on my machine. You should test this on your machine to see if those are also the main problems. I think any further optimizations can't be done "from the armchair" and must instead be done with a proper profiler.

Remember to ghc -O2 the code, else the tail-calls and other optimizations may not take.

Tree traversal in C++ and Haskell

Answers (2)

Related Questions