Anonymous Person
Anonymous Person

Reputation: 1540

Unable to read a parquet file

I am breaking my head over this right now. I am new to this parquet files, and I am running into a LOT of issues with it.

I am thrown an error that reads OSError: Passed non-file path: \datasets\proj\train\train.parquet each time I try to create a df from it.

I've tried this: pq.read_pandas(r'E:\datasets\proj\train\train.parquet').to_pandas() AND od = pd.read_parquet(r'E:\datasets\proj\train\train.parquet', engine='pyarrow')

I also changed the drive letter of the drive the dataset resides, and it's the SAME THING!

It's the same with all engines.

PLEASE HELP!

Upvotes: 10

Views: 19632

Answers (2)

Udi Yosovzon
Udi Yosovzon

Reputation: 151

Try using fastparquet as engine, worked for me.

engine = "fastparquet"

Upvotes: 2

Uwe L. Korn
Uwe L. Korn

Reputation: 8826

This might be a problem with Arrow's file path handling. You could instead pass in an already opened file:

import pandas as pd

with open(r'E:\datasets\proj\train\train.parquet', 'rb') as f:
    df = pd.read_parquet(f, engine='pyarrow')

Upvotes: 10

Related Questions