andars
andars

Reputation: 1404

Convention for Huffman Coding

Is there a convention for generating a Huffman encoding for a certain alphabet? It seems like the resultant encoding depends both on whether you assign '0' to the left child or the right child as well as how you determine which symbol will go to the left tree.

Wikipedia says that:

As a common convention, bit '0' represents following the left child and bit '1' represents following the right child.

So that is an answer to the first half of the variance. However, I couldn't find any convention for the second half. I would assume something like making the node with lower probability go on the left, but several example Huffman trees online don't do this.

For example:

huffman tree

So is there a convention for the assignment of nodes to left and right, or is it up to the implementation?

I apologize if this is a duplicate, but I wasn't able to find an answer.

Upvotes: 3

Views: 1415

Answers (2)

David190
David190

Reputation: 34

Check it out

        class Nodo{
      constructor(v=null,f=null,l=null,r=null){
            this.f=f
            this.v=v
            this.l=l
            this.r=r
          }
      }
        function EnCrypt(text){
            let lista=[]
            for(let i=0;i<text.length;i++){//Create the list with the appearances 
                !lista.find(e => e.v===text[i]) && lista.push(new Nodo(text[i],(text.match(new RegExp(text[i],"g")) || []).length))
            }
            lista=lista.sort((a,b)=>a.f-b.f)//Order from smallest to largest
            //----------------------------------------------------
            function createNew(){//Create the tree
                let nodos=lista.splice(0,2)
                lista.push(new Nodo(null,nodos[0].f+nodos[1].f,nodos[0],nodos[1]))
                if(lista.length==1)return lista
                createNew()
            }createNew()
          ///-----------------------------------------------
            let Arbol=lista[0]
            function Codigo(nodo,c=""){//recursively traverse the tree 
                if(!nodo.l && !nodo.r)return [nodo.v,c]
                return Codigo(nodo.l,c+"0")+";"+Codigo(nodo.r,c+"1")
            }
           //-----------------------------------------------
            const codigo=(Codigo(Arbol)).split(";")
            let finish=""
            text.split("").map(t=>{
                codigo.map(e => {
                    if(e.split(",")[0]==t)finish+=e.split(",")[1]
                })
            })
            return {
                cod:finish,
                dicc:codigo
            }
        }
        function DeCrypt(key,res=""){
            let {cod,dicc}=key
            let temp=""
            for(let i=0;i<=cod.length;i++){
                temp+=cod.substr(i,1)
               dicc.map((d)=>{
                    d=d.split(",")
                    if(temp==d[1]){
                        res+=d[0]
                        temp=""
                        cod=cod.substr(i)
                        i=0
                    }
                })
            }
            return res
        }
        function Huffman(){
                const text=document.querySelector("#newValue").value
                const comp= EnCrypt(text)
                document.querySelector(".res").innerHTML=JSON.stringify(comp,null, 4)
            }
        function HuffmanDecode(){
                const text=JSON.parse(document.querySelector("#huffman").value)
                const comp= DeCrypt(text)
                document.querySelector(".res2").innerHTML=comp
            }
<h1></h1>
<input type="text"
       placeholder="set value (min 2 chars)" id="newValue">
<button onclick="Huffman()">Go</button>
<div class="res" style="white-space: pre-wrap;"></div>

<input type="text" placeholder="paste the result from above" id="huffman"><button onclick="HuffmanDecode()">decode</button>
 <div class="res2" style="white-space: pre-wrap;"></div>

Upvotes: -1

Mark Adler
Mark Adler

Reputation: 112374

Yes, in fact there is. Not so much a convention for interoperability, but rather for encoding efficiency. It's called Canonical Huffman, where the codes are assigned in numerical order from the shortest codes to the longest codes, and within a single code length, they are assigned in a lexicographical order on the symbols. This permits transmitting only the length of the code for each symbol, as opposed to the entire tree structure.

Generally what is done is to use the Huffman algorithm tree only to determine the number of bits for each symbol. The tree is then discarded. Bit values are never assigned to the branches. The codes are then built directly from the lengths, using the ordering above.

Upvotes: 3

Related Questions