user9439906
user9439906

Reputation: 503

How can I open a .snappy.parquet file in python?

How can I open a .snappy.parquet file in python 3.5? So far, I used this code:

import numpy
import pyarrow

filename = "/Users/T/Desktop/data.snappy.parquet" 
df = pyarrow.parquet.read_table(filename).to_pandas()

But, it gives this error:

AttributeError: module 'pyarrow' has no attribute 'compat'

P.S. I installed pyarrow this way:

pip install pyarrow

Upvotes: 10

Views: 28920

Answers (3)

Nikil Kumar
Nikil Kumar

Reputation: 121

You can use pandas to read snppay.parquet files into a python pandas dataframe.

import pandas as pd
filename = "/Users/T/Desktop/data.snappy.parquet"
df = pd.read_parquet(filename)

Upvotes: 8

Bengi Koseoglu
Bengi Koseoglu

Reputation: 159

I have got the same issue and managed to solve it by following the solutio proposed in https://github.com/dask/fastparquet/issues/366 solution.

1) install python-snappy by using conda install (for some reason with pip install, I couldn't download it)

2) Add the snappy_decompress function.

from fastparquet import ParquetFile
import snappy
def snappy_decompress(data, uncompressed_size):
    return snappy.decompress(data)
pf = ParquetFile('filename') # filename includes .snappy.parquet extension
dff=pf.to_pandas()

Upvotes: 8

Uwe L. Korn
Uwe L. Korn

Reputation: 8816

The error AttributeError: module 'pyarrow' has no attribute 'compat' is sadly a bit misleading. To execute the to_pandas() function on a pyarrow.Table instance you need pandas installed. The above error is a sympton of the missing requirement.

pandas is a not a hard requirement of pyarrow as most of its functionality is usable with just Python built-ins and NumPy. Thus users of pyarrow which include pandas can work with it without needing to have pandas pre-installed.

Upvotes: 4

Related Questions