Reputation: 300

what compression algorithm to use for highly redundant data

This program uses sockets to transfer highly redundant 2D byte arrays (image like). While the transfer rate is comparatively high (10 Mbps), the arrays are also highly redundant (e.g. Each row may contain several consequently similar values). I have tried zlib and lz4 and the results were promising, however I still think of a better compression method and please remember that it should be relatively fast as in lz4. Any suggestions?

Upvotes: 6

Answers (2)

LemonCool

Reputation: 1260

you could create your own, if the data in rows is similar you can create a resource / index map thus reducing substantial the size, something like this

Original file:
row 1: 1212, 34,45,1212,45,34,56,45,56
row 2: 34,45,1212,78,54,87,....

you could create a list of unique values, than use and index in replacement,

34,45,54,56,78,87,1212

row 1: 6,0,2,6,1,0,.....

this can potantialy save you over 30% or more data transfer, but it depends on how redundant the data is

UPDATE

Here a simple implementation

std::set<int> uniqueValues
DataTable my2dData; //assuming 2d vector implementation
std::string indexMap;
std::string fileCompressed = "";

int Find(int value){
  for(int i = 0; i < uniqueValues.size; ++i){
     if(uniqueValues[i] == value) return i;
  }
  return -1;
}

//create list of unique values
for(int i = 0; i < my2dData.size; ++i){
  for(int j = 0; j < my2dData[i].size; ++j){
     uniqueValues.insert(my2dData[i][j]);
  }
}    

//create indexes
for(int i = 0; i < my2dData.size; ++i){
  std::string tmpRow = "";
  for(int j = 0; j < my2dData[i].size; ++j){
     if(tmpRow == ""){ 
       tmpRow = Find(my2dData[i][j]);     
     }
     else{
       tmpRow += "," + Find(my2dData[i][j]);
    }
  }
  tmpRow += "\n\r";
  indexMap += tmpRow;
}

//create file to transfer
for(int k = 0; k < uniqueValues.size; ++k){
  if(fileCompressed == ""){ 
       fileCompressed = "i: " + uniqueValues[k];     
     }
     else{
       fileCompressed += "," + uniqueValues[k];
    }
}
fileCompressed += "\n\r\d:" + indexMap;

now on the receiving end you just do the opposite, if the line start with "i" you get the index, if it start with "d" you get the data

Upvotes: 2

Mark Adler

Reputation: 112502

You should look at the PNG algorithms for filtering image data before compressing. They are simple to more sophisticated methods for predicting values in a 2D array based on previous values. To the extent that the predictions are good, the filtering can make for dramatic improvements in the subsequent compression step.

You should simply try these filters on your data, and then feed it to lz4.

Upvotes: 4

what compression algorithm to use for highly redundant data

Answers (2)

Related Questions