Reputation: 195
I need help vectorizing a function in numpy. In Julia, I can do something like that:
((a,b,c) -> [a,b,c]).([[1,2],[3,4]],[[5,6],[7,8]],nothing)
which returns
2-element Vector{Vector{Union{Nothing, Vector{Int64}}}}:
[[1, 2], [5, 6], nothing]
[[3, 4], [7, 8], nothing]
It takes one sublist at a time from the iterables and expands nothing
.
In Python, I just can't get to have a similar behaviour. I tried:
np.vectorize(lambda a,b,c: [a,b,c])([[1,2], [3,4]], [[5,6], [7,8]], None)
but it returns:
array([[list([1, 5, None]), list([2, 6, None])],
[list([3, 7, None]), list([4, 8, None])]], dtype=object)
If I do:
np.vectorize(lambda a,b,c: print(a,b,c))([[1,2], [3,4]], [[5,6], [7,8]], np.nan)
I get back:
1 5 nan
1 5 nan
2 6 nan
3 7 nan
4 8 nan
I tried with excluded parameter, but il excludes the whole array:
np.vectorize(lambda a,b,c: print(a,b,c), excluded=[0])([[1,2], [3,4]], [[5,6], [7,8]], np.nan)
prints:
[[1, 2], [3, 4]] 5 nan
[[1, 2], [3, 4]] 5 nan
[[1, 2], [3, 4]] 6 nan
[[1, 2], [3, 4]] 7 nan
[[1, 2], [3, 4]] 8 nan
By the way, the actual function is a sklearn function, not a lambda one.
Upvotes: 0
Views: 894
Reputation: 231375
You gave it a (2,2), (2,2) and scalar arguments. np.vectorized
called your function 4 times, each time with a tuple of values from those 3 (broadcasted together).
You also see that with the print
version. There's an additional tuple at the start, used to determine the return dtype, which in this case is a list, so dtype=object
.
With the exclude
it doesn't iterate on the values of the 1st argument, rather it just passes it whole.
Here's the right way to create your list of lists:
In [811]: a,b,c = [[1,2], [3,4]], [[5,6], [7,8]], None
In [813]: [[i,j,None] for i,j in zip(a,b)]
Out[813]: [[[1, 2], [5, 6], None], [[3, 4], [7, 8], None]]
If we add a signature
(and otypes
):
In [821]: f = np.vectorize(lambda a,b,c: [a,b,c], signature='(n),(n),()->()', otypes=[object])
In [822]: f(a,b,c)
Out[822]:
array([list([array([1, 2]), array([5, 6]), None]),
list([array([3, 4]), array([7, 8]), None])], dtype=object)
Now it calls the function only twice. But the result is much slower. Read, and reread, the notes
about performance.
If we make the list arguments into arrays first:
In [825]: A,B = np.array(a), np.array(b)
In [826]: A,B
Out[826]:
(array([[1, 2],
[3, 4]]),
array([[5, 6],
[7, 8]]))
the signature f
returns the same thing, showing that vectorize
does convert the lists to arrays:
In [827]: f(A,B,c)
Out[827]:
array([list([array([1, 2]), array([5, 6]), None]),
list([array([3, 4]), array([7, 8]), None])], dtype=object)
If we passed the arrays to the list comprehension, we can get:
In [829]: np.array([[i,j,None] for i,j in zip(A,B)], object)
Out[829]:
array([[array([1, 2]), array([5, 6]), None],
[array([3, 4]), array([7, 8]), None]], dtype=object)
In [830]: _.shape
Out[830]: (2, 3)
Upvotes: 1