How to snappy compress a file using a python script

I am trying to compress in snappy format a csv file using a python script and the python-snappy module. This is my code so far:

import snappy
d = snappy.compress("C:\\Users\\my_user\\Desktop\\Test\\Test_file.csv")
with open("compressed_file.snappy", 'w') as snappy_data:
     snappy_data.write(d)
snappy_data.close()

This code actually creates a snappy file, but the snappy file created only contains a string: "C:\Users\my_user\Desktop\Test\Test_file.csv"

So I am a bit lost on getting my csv compressed. I got it done working on windows cmd with this command:

python -m snappy -c Test_file.csv compressed_file.snappy

But I need it to be done as a part of a python script, so working on cmd is not fine for me.

Thank you very much, Álvaro

Upvotes: 1

Views: 10645

Answers (1)

GlynD
GlynD

Reputation: 442

You are compressing the plain string, as the compress function takes raw data.

There are two ways to compress snappy data - as one block and the other as streaming (or framed) data

This function will compress a file using framed method

import snappy

def snappy_compress(path):
        path_to_store = path+'.snappy'

        with open(path, 'rb') as in_file:
          with open(path_to_store, 'w') as out_file:
            snappy.stream_compress(in_file, out_file)
            out_file.close()
            in_file.close()

        return path_to_store

snappy_compress('testfile.csv')

You can decompress from command line using:

python -m snappy -d testfile.csv.snappy testfile_decompressed.csv

It should be noted that the current framing used by python / snappy is not compatible with the framing used by Hadoop

Upvotes: 2

Related Questions