reading parquet to pandas FileNotFoundError

Question

I have code as below and it runs fine. It reads as a spark dataframe

April_data = sc.read.parquet('somepath/data.parquet')
type(April_data)
pyspark.sql.dataframe.DataFrame

But when I try to read as a pandas df I get error

df_pp = pd.read_parquet('somepath/data.parquet')

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/tmp/ipykernel_4244/1910461502.py in 
----> 1 df_pp = pd.read_parquet('somepath/data.parquet')

/usr/local/anaconda//parquet.py in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, **kwargs)
    498         storage_options=storage_options,
    499         use_nullable_dtypes=use_nullable_dtypes,
--> 500         **kwargs,
    501     )

/usr/local/anaconda//io/parquet.py in read(self, path, columns, use_nullable_dtypes, storage_options, **kwargs)
    234             kwargs.pop("filesystem", None),
    235             storage_options=storage_options,
--> 236             mode="rb",
    237         )
    238         try:

/usr/local/anaconda/parquet.py in _get_path_or_handle(path, fs, storage_options, mode, is_dir)
    100         # this branch is used for example when reading from non-fsspec URLs
    101         handles = get_handle(
--> 102             path_or_handle, mode, is_text=False, storage_options=storage_options
    103         )
    104         fs = None

/usr/local/anaconda/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    709         else:
    710             # Binary mode
--> 711             handle = open(handle, ioargs.mode)
    712         handles.append(handle)
    713 

FileNotFoundError: [Errno 2] No such file or directory: 'somepath/data.parquet'

I have installed fastparquet package as below

!pip install fastparquet
Successfully installed cramjam-2.5.0 fastparquet-0.8.1

# udpate 1

the file is located in HDFS and I can see the file when I do

hdfs_location = 'somepath/'
!hdfs dfs -ls $hdfs_location

I am running all this code in the same file

reading parquet to pandas FileNotFoundError

Answers (1)

Related Questions