Simon
Simon

Reputation: 31

Error while converting csv to parquet file using pandas

I would like to upload csv as parquet file to S3 bucket. Below is the code snippet.

df = pd.read_csv('right_csv.csv')
csv_buffer = BytesIO()
df.to_parquet(csv_buffer, compression='gzip', engine='fastparquet')
csv_buffer.seek(0)

Above is giving me an error: TypeError: expected str, bytes or os.PathLike object, not _io.BytesIO How to make it work?

Upvotes: 3

Views: 1105

Answers (2)

Kabilan Mohanraj
Kabilan Mohanraj

Reputation: 1906

As per the documentation, when fastparquet is used as the engine, io.BytesIO cannot be used. auto or pyarrow engine have to be used. Quoting from the documentation.

The engine fastparquet does not accept file-like objects.

Below code works without any issues.

import io
f = io.BytesIO()
df.to_parquet(f, compression='gzip', engine='pyarrow')
f.seek(0)

Upvotes: 3

0x26res
0x26res

Reputation: 13882

As mentioned in the other answer, this is not supported. One work around would be to save as parquet to a NamedTemporaryFile. Then copy the content to a BytesIO buffer:


import tempfile

with tempfile.NamedTemporaryFile() as tmp:
    df.to_parquet(tmp.name, compression='gzip', engine='fastparquet')
    with open(tmp.name, 'rb') as fh:
        buf = io.BytesIO(fh.read())
        

Upvotes: 2

Related Questions