Reputation: 1161
Has anyone solved the error: message: compressions['SNAPPY'] = snappy.compress
AttributeError: module 'snappy' has no attribute 'compress'
when reading parquet in python? Btw, is there a way to read whole dir?
I am using python 3
through conda
on mac with snappy
and thrift
installed as per https://pypi.python.org/pypi/parquet
code as follows:
import parquet
import json
import fastparquet
with open(data_in_path + "file.parquet/part-01snappy.parquet", 'rb') as fo:
for row in parquet.DictReader(fo, columns=['id', 'title']):
print(json.dumps(row))
or
df2 = fastparquet.ParquetFile(path).to_pandas()
Upvotes: 0
Views: 3249
Reputation: 411
I had the same issue.
The reason was I installed the wrong python package. You should install python-snappy
instead of snappy
In my case (os x), it was a simple, two step process:
brew install snappy
pip install python-snappy
Upvotes: 2
Reputation: 1161
was not able to find snappy
solution, so I read data in spark
with snappy
and write it back with gzip
after each no issue in python are found:
df.coalesce(1).write.option("overwrite","true").option("compression","gzip").parquet(dfWithGzip.parquet")
Upvotes: 0