Reputation: 7632
The array contains python objects and is part of a table. I need to perform a calculation element-wise. The calculation itself returns a list of numbers which should then be new columns in the table.
I looked at the documentation but don't see anyway to iterate the pyarrow array? Is there a way or to I have to first convert it to a numpy array? (that is what the documentation example of user-defined functions shows)
Upvotes: 0
Views: 1303
Reputation: 1791
You can iterate ChunkedArrays, they support the iterable protocol
>>> a = pa.chunked_array([[1,2,3], [4,5,6]])
>>> for x in a: print(x)
...
1
2
3
4
5
6
But that's rarely what you want to do, because it's fairly slow. As much as possible you want to build your algorithm constructing it as a combination of compute functions ( https://arrow.apache.org/docs/python/api/compute.html ) applied to the array.
The User Defined Functions example converts the pyarrow array to a numpy array because it wants to use the numpy.gcd function ( https://numpy.org/doc/stable/reference/generated/numpy.gcd.html ) which requires a numpy array.
Upvotes: 1