pyarrow read parquet via column index or order?

Question

Is there a workaround to selectively read parquet files via column index instead of column name?

Documentation shows reading via column name:

pq.read_table('example.parquet', columns=['one', 'three'])

What I'm looking for is something like:

pq.read_table('example.parquet', columns=[0, 2])

Similar question: Pandas Read/Write Parquet Data using Column Index

Update with attempt

This is redundant and I might as well drop columns in memory with either pandas or numpy.

desired_cols = [0,2]

pat = pq.read_table('file.parquet.gzip')

cols_names = pat.column_names

del pat

desired_cols = [cols_names[c] for c in desired_cols]

pq.read_table('file.parquet.gzip',columns=desired_cols)

"""
pyarrow.Table
anzsic06: string
year: int64
"""

pyarrow read parquet via column index or order?

Answers (1)

Related Questions