Reputation: 1258
I'm importing parquet snappy files using the following script:
import pandas as pd
import glob
files = glob.glob('/home/....101.parquet/*.parquet')
df = pd.concat([pd.read_parquet(fp) for fp in files])
The final result that I am showing in the picture is not fully decompressed.
Upvotes: 0
Views: 621
Reputation: 13902
The data is fully decompressed, but some columns are struct types.
You can try flattening them by calling this:
import pyarrow.parquet as pq
pd.concat([pq.read_table(fp).flatten().to_pandas() for fp in files])
But I'm not sure it will fully help because it looks like some of them contain arrays.
Upvotes: 1