puffles
puffles

Reputation: 372

How to read parquet file using Pandas

I am trying to read a parquet file using Python 3.6.

import pandas as pd


df = pd.read_parquet('smalldata.parquet')



df.head()

However, this is generating an error that module pandas has no attribute read_parquet. What dependencies should I cater in order to solve this problem?

Edit 1:

I updated Pandas and this is the stacktrace

Requirement already up-to-date: pandas in /home/fatima/miniconda2/lib/python2.7/site-packages (0.24.2)
Requirement already satisfied, skipping upgrade: pytz>=2011k in /home/fatima/miniconda2/lib/python2.7/site-packages (from pandas) (2018.9)
Requirement already satisfied, skipping upgrade: numpy>=1.12.0 in /home/fatima/miniconda2/lib/python2.7/site-packages (from pandas) (1.16.2)
Requirement already satisfied, skipping upgrade: python-dateutil>=2.5.0 in /home/fatima/miniconda2/lib/python2.7/site-packages (from pandas) (2.8.0)
Requirement already satisfied, skipping upgrade: six>=1.5 in /home/fatima/miniconda2/lib/python2.7/site-packages (from python-dateutil>=2.5.0->pandas) (1.12.0)

Edit 2: this is what conda list gives me

pandas                    0.24.2                   pypi_0    pypi

Upvotes: 2

Views: 7304

Answers (2)

Alex
Alex

Reputation: 97

You will need to install the required packages:

pip install pandas pyarrow s3fs fastparquet

Upvotes: 1

qxzsilver
qxzsilver

Reputation: 655

If you are trying to read Parquet files in Pandas, it may be that you don't have one of the engines installed for reading Parquet files, such as pyarrow or fastparquet. You would need to install those dependencies as Pandas read_parquet requires either of these engines in order to read Parquet files. For each of those dependencies, you would also need to figure out which dependencies are required for installing each of those libraries.

If this isn't the issue, can you please comment on what the error you are encountering may be?

Upvotes: 0

Related Questions