Reputation: 59131
I am looking into using an Edit Distance algorithm to implement a fuzzy search in a name database.
I've found a data structure that will supposedly help speed this up through a divide and conquer approach - Burkhard-Keller Trees. The problem is that I can't find very much information on this particular type of tree.
If I populate my BK-tree with arbitrary nodes, how likely am I to have a balance problem?
If it is possibly or likely for me to have a balance problem with BK-Trees, is there any way to balance such a tree after it has been constructed?
What would the algorithm look like to properly balance a BK-tree?
My thinking so far:
It seems that child nodes are distinct on distance, so I can't simply rotate a given node in the tree without re-calibrating the entire tree under it. However, if I can find an optimal new root node this might be precisely what I should do. I'm not sure how I'd go about finding an optimal new root node though.
I'm also going to try a few methods to see if I can get a fairly balanced tree by starting with an empty tree, and inserting pre-distributed data.
FYI, I am not currently worrying about the name-synonym problem (Bill vs William). I'll handle that separately, and I think completely different strategies would apply.
Upvotes: 8
Views: 2143
Reputation: 12592
There is a lisp example in the article: http://cliki.net/bk-tree. About unbalancing the tree I think the data structure and the method seems to be complicated enough and also the author didn't say anything about unbalanced tree. When you experience unbalanced tree maybe it's not for you?
Upvotes: 0