MasterGberry
MasterGberry

Reputation: 2860

Compress data in Java and decompress in Python

So I am trying to compress (gzip or similar format) a JSON object before I throw it in my MySQL database. I am currently storing the data as BLOB. I have tried to use the following Java method to compress the data:

public static byte[] compress(String str) throws Exception {
    if (str == null || str.length() == 0) {
        return null;
    }

    ByteArrayOutputStream obj = new ByteArrayOutputStream();
    GZIPOutputStream gzip = new GZIPOutputStream(obj);
    gzip.write(str.getBytes("UTF-8"));
    gzip.close();
    return obj.toByteArray();
}

and then store it in the database using setBytes() with a PreparedStatement and am having no issues with this. What I am having issues with is decrypting the data in Python 2.7 I have tried using zlib.decompress() to no avail. It can't seem to read the data that Java is storing. I also need to write a conversion script in Python to compress the old rows into this new format. So whatever format I needs to be readable by the Python decompress() whether it was compressed with Java or Python 2.7

I am happy to provide anymore information that can assist in helping to find a solution to my dilemma.

Thanks.

EDIT: Some of the Python Code:

class KitPvPMatch(Base):
    """ The SQLAlchemy declarative model class for a User object. """
    __tablename__ = 'kit_pvp_matches'
    __table_args__ = {
        'mysql_engine': 'InnoDB',
        'mysql_charset': 'utf8'
    }

    match_id = Column(INTEGER(11), autoincrement=True, primary_key=True, nullable=False)
    season = Column(Unicode(5), nullable=False)
    winner = Column(Unicode(16), nullable=False)
    loser = Column(Unicode(16), nullable=False)
    ladder_id = Column(TINYINT(4), nullable=False)
    data = Column(BLOB, nullable=False)

# The line in question
jsonData = json.loads(zlib.decompress(match.data))

# The error
error: Error -3 while decompressing data: incorrect header check

Upvotes: 4

Views: 2780

Answers (2)

ahfx
ahfx

Reputation: 377

I'm rehashing the answer from here: https://stackoverflow.com/a/12572031/7298096 because the question of this thread is exactly the topic I was looking for. In my case the Java code is compressing the content with DeflaterOutputStream and then encoding with Base64.encodeBase64String The error: Error -3 while decompressing data: incorrect header check is resolved if I provide a 32 signal offset to zlip decompres:

import base64
import zlib

data = "ENCODED_COMPRESSED_STRING_FROM_JAVA"

output_str = zlib.decompress(base64.b64decode(data), 32 + zlib.MAX_WBITS).decode('utf-8')

Upvotes: 1

Andrew Scott Evans
Andrew Scott Evans

Reputation: 1033

Here is a post that goes over unzipping using zlib with a stream.

Otherwise, have you tried the gzip docs for gzip.py. You may need a temp file. The documentation for gzip is here. There is a fairly decent solution for this approach in the following post on decompression..

If you haven't already, ensure that you are getting bytes back from SQL. Python is flexible so it may be a string. Call bytearray(string) on your string if this is the case.

If that doesn't work:

  1. What format is the data in when returned by your SQL command?
  2. What error, if any are you getting?

Upvotes: 1

Related Questions