What is a faster way of compressing this data?

Question

I have a program that computes a ton of data and writes it to a file. My data is just a bunch of numbers from 0-16 (17 different values), and I have computed the frequency that each number appears in the data. I need to use as little disk space and as little RAM as possible, so I wrote a little Huffman encoding/decoding module in pure python that writes/read the compressed data with as few encoded symbols in memory at a time. Is there a module that comes with python that can do something similar? Here is the code with a little examples of how it will be used (WARNING the code is kinda long decently commented):

def makeTree(data):
    """data is a list of tuples, whos first entry is a priority/frequency
    number, and whos second entry is tuple containing the data to
    be encoded. The tree uses an internal tag system to tell where the
    branch ends. (End nodes are terminated with a False)"""
    def tag(data):
        taggedData = []
        for priority, datum in data:
            #all of the initial data is an end branch
            taggedData += [(priority, (datum, False))]
        return taggedData
    #get the tagged data into decending order of priority/frequency
    tree = sorted(tag(data), reverse=True)
    while len(tree)>1:
        #get the two lowest priority branches
        bottomR, bottomL = tree.pop(), tree.pop()
        #and stick them together into a new branch with combined priority
        new_elem = bottomR[0]+bottomL[0], ((bottomL, bottomR), True)
        #then add them back to the list of branches and sort
        tree += [new_elem]
        tree.sort(reverse=True)
    return tree[0]

def makeTable(tree, code=""):
    """Takes a tree such as generated by makeTree and returns a dictionary
    of code:value pairs."""
    #if this is an end branch, return code:value pair
    if tree[1][1]==False:
        return {code:tree[1][0]}
    #otherwise find the tables for the left and right branches
    #add them to the main table, and return
    table = {}
    table.update(makeTable(tree[1][0][0], code+'0')) #left
    table.update(makeTable(tree[1][0][1], code+'1')) #right
    return table

class Decoder:
    """this class creates a Decoder object which is used to decode a compressed
    file using the appropriate decoding table (duh). It used to be a function,
    but it was buggy and would also be ugly if I left it a function. (this
    class was written After the Encdoer class.)
    """
    def __init__(self, fname, table):
        self.file = open(fname)
        self.table = table
        self.byte = None
        self.bit = 7
        self.newByte = True

    def decode(self, size=1):
        """Decodes and yields size number of symbols from the file.
        Size defaults to 1"""
        #a counter for how many symbols were read
        read = 0
        code = ''
        while read> n)
                self.byte &= (1<



Is there a module that can do something similar faster? If not, are there ways I can change my code to make it faster?

cmd · Accepted Answer

In memory the different occurrences of the same value do only use up one location. The reference to them is repeated though.
For disk storage, I'd probably just compress it with zlib.

What is a faster way of compressing this data?

Answers (2)

Related Questions