Reputation: 503
How can I open a .snappy.parquet file in python 3.5? So far, I used this code:
import numpy
import pyarrow
filename = "/Users/T/Desktop/data.snappy.parquet"
df = pyarrow.parquet.read_table(filename).to_pandas()
But, it gives this error:
AttributeError: module 'pyarrow' has no attribute 'compat'
P.S. I installed pyarrow this way:
pip install pyarrow
Upvotes: 10
Views: 28920
Reputation: 121
You can use pandas to read snppay.parquet files into a python pandas dataframe.
import pandas as pd
filename = "/Users/T/Desktop/data.snappy.parquet"
df = pd.read_parquet(filename)
Upvotes: 8
Reputation: 159
I have got the same issue and managed to solve it by following the solutio proposed in https://github.com/dask/fastparquet/issues/366 solution.
1) install python-snappy by using conda install (for some reason with pip install, I couldn't download it)
2) Add the snappy_decompress function.
from fastparquet import ParquetFile
import snappy
def snappy_decompress(data, uncompressed_size):
return snappy.decompress(data)
pf = ParquetFile('filename') # filename includes .snappy.parquet extension
dff=pf.to_pandas()
Upvotes: 8
Reputation: 8816
The error AttributeError: module 'pyarrow' has no attribute 'compat'
is sadly a bit misleading. To execute the to_pandas()
function on a pyarrow.Table
instance you need pandas installed. The above error is a sympton of the missing requirement.
pandas is a not a hard requirement of pyarrow
as most of its functionality is usable with just Python built-ins and NumPy. Thus users of pyarrow
which include pandas can work with it without needing to have pandas pre-installed.
Upvotes: 4