Can pyarrow filter parquet struct and list columns?

Question

Take the following table stored via pyarrow into Apache Parquet:

	id	regions
0	A	['us', 'uk']
1	B	['uk', 'mx']

I'd like to filter the regions column via parquet when loading data. Something like this:

import pyarrow.dataset as ds
dataset = ds.dataset("./example.parquet", format="parquet")
dataset.to_table(filter=ds.scalar('us').isin(ds.field('region')))

The expectation is that I would get back the first row, but not the second row.

This, however, does not work.The documentation does not have any useful information on how to do this kind of op. Is there any way of performing filters on more complex column types?

Can pyarrow filter parquet struct and list columns?

Answers (1)

Related Questions