elcomendante
elcomendante

Reputation: 1161

python 3 mac: snappy.compress AttributeError: module 'snappy' has no attribute 'compress'

Has anyone solved the error: message: compressions['SNAPPY'] = snappy.compress AttributeError: module 'snappy' has no attribute 'compress' when reading parquet in python? Btw, is there a way to read whole dir?

I am using python 3 through conda on mac with snappy and thrift installed as per https://pypi.python.org/pypi/parquet

code as follows:

import parquet
import json
import fastparquet

with open(data_in_path + "file.parquet/part-01snappy.parquet", 'rb') as fo:
for row in parquet.DictReader(fo, columns=['id', 'title']):
    print(json.dumps(row))

or

 df2 = fastparquet.ParquetFile(path).to_pandas()

Upvotes: 0

Views: 3249

Answers (2)

Javier Alba
Javier Alba

Reputation: 411

I had the same issue.

The reason was I installed the wrong python package. You should install python-snappy instead of snappy

In my case (os x), it was a simple, two step process:

brew install snappy
pip install python-snappy

Upvotes: 2

elcomendante
elcomendante

Reputation: 1161

was not able to find snappy solution, so I read data in spark with snappy and write it back with gzip after each no issue in python are found:

df.coalesce(1).write.option("overwrite","true").option("compression","gzip").parquet(dfWithGzip.parquet")

Upvotes: 0

Related Questions