Reputation: 9207
My question is, how do I broadcast values in np.where
when using multiple conditions/outputs without having to rely on multiplication?
Input:
import pandas as pd
df = pd.DataFrame({'test':range(0,10)})
test
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
Expected Output:
test column1 column2
0 0 2 4
1 1 2 4
2 2 2 4
3 3 2 4
4 4 1 3
5 5 1 3
6 6 1 3
7 7 1 3
8 8 1 3
9 9 1 3
My (working) code:
mask = df['test'] > 3
m_len = len(mask)
df['column1'], df['column2'] = np.where([mask, mask], [[1]*m_len, [3]*m_len], [[2]*m_len, [4]*m_len])
Question:
Normally np.where()
accepts an array and a static value, for example:
np.where(mask, 1, 2) # where mask is a series
My expections where that if I now use this:
np.where([mask, mask], [1, 3], [2, 4])
it would broadcast this values.
But I get the following error:
ValueError: operands could not be broadcast together with shapes (2,10) (2,) (2,)
Is there a way to broadcast the values without having to use the m_len
variable (as shown in my working code)?
Note: I know I can just use np.where
multiple times, in multiple lines, but I want to solve it in that one-liner.
Upvotes: 1
Views: 387
Reputation: 18315
If you make the shapes of the values you put in as (2, 1)
, it will broadcast. Therefore, here is a way with np.r_
:
df[["col1", "col2"]] = np.where(mask, np.r_["c", 1, 3], np.r_["c", 2, 4]).T
where the last T
is needed since np.where
will return (2, -1)
-shaped array but pandas expects (-1, 2)
for its two columns.
We can also give only one mask
if both masks are the same since it will broadcast it too:
mask -> (10,)
values -> (2, 1)
then
mask' -> (1, 10)
values -> (2, 1)
and lastly
mask'' -> (2, 10)
values' -> (2, 10)
Upvotes: 2