Dario Federici
Dario Federici

Reputation: 1258

Python not fully decompressing snappy parquet

I'm importing parquet snappy files using the following script:

import pandas as pd
import glob

files = glob.glob('/home/....101.parquet/*.parquet')
df = pd.concat([pd.read_parquet(fp) for fp in files])

The final result that I am showing in the picture is not fully decompressed.

Dataframe

Upvotes: 0

Views: 621

Answers (1)

0x26res
0x26res

Reputation: 13902

The data is fully decompressed, but some columns are struct types.

You can try flattening them by calling this:

import pyarrow.parquet as pq

pd.concat([pq.read_table(fp).flatten().to_pandas() for fp in files])

But I'm not sure it will fully help because it looks like some of them contain arrays.

Upvotes: 1

Related Questions