SnG
SnG

Reputation: 402

How to index a PyArrow Table?

Im currently using Arrow in my machine learning model to read data from Parquet. Currently I'm trying to figure out how to get certain records from an Arrow table. I see that an Arrow Table has a "Take" api , but I'm not sure how to use it. I tried passing in an int index, but when I try that Im getting the following exception:

Got unexpected argument type <class 'int'> for compute function

Anyone know how I can read records from an arrow table?

Upvotes: 1

Views: 3262

Answers (1)

joris
joris

Reputation: 139162

The take() method of a pyarrow Table needs an array-like of indices (and not a single integer index):

>>> import pyarrow as pa
>>> table = pa.table({'a': range(5)})
>>> table.to_pandas()
   a
0  0
1  1
2  2
3  3
4  4

>>> table.take([0, 2]).to_pandas()
   a
0  0
1  2

Upvotes: 1

Related Questions