Reputation: 373
I'm trying to compare two values (this can be two columns or even two random values), the result of said comparison will be used to fill values in a pandas column. The comparison itself is not a problem, and can be done using np.where
. But my goal is to have a conditional switch. For example, if the comparision results in true, than all the values until comparision again becomes true will be X. In case the condition becomes true again, the values to be filled will switch to Y, followed by X in case of another true....until all comparisons are done.
Similar to this: Repeat the value in column until a change occurs
Minimum working example to create dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.uniform(0, .01, size=(100, 1)),
columns=list('A'))
df['condition'] = np.where(df['A'] > np.random.uniform(0.0, 0.5),
'Switch', 'DoNotSwitch')
print df
A condition
0 0.004392 DoNotSwitch
1 0.002493 DoNotSwitch
2 0.007735 DoNotSwitch
3 0.000523 Switch
4 0.009902 DoNotSwitch
5 0.001712 DoNotSwitch
6 0.006146 DoNotSwitch
7 0.005325 Switch
8 0.003555 DoNotSwitch
9 0.003444 DoNotSwitch
10 0.005225 DoNotSwitch
11 0.000619 DoNotSwitch
What I want is:
A condition Result
0 0.004392 DoNotSwitch T
1 0.002493 DoNotSwitch T
2 0.007735 DoNotSwitch T
3 0.000523 Switch F
4 0.009902 DoNotSwitch F
5 0.001712 DoNotSwitch F
6 0.006146 DoNotSwitch F
7 0.005325 Switch T
8 0.003555 DoNotSwitch T
9 0.003444 DoNotSwitch T
10 0.005225 DoNotSwitch T
11 0.000619 DoNotSwitch T
Is there a way to condense np.where
and the generation of Result
column in one single step, since the condition
column just indicates where the switch should happen and itself is not necessary ?
Upvotes: 1
Views: 266
Reputation: 53029
Switching a boolean old_value
conditional on another boolean switch
can be written using XOR
:
new_value = old_valaue ^ switch
To do it sequentially we can use the accumulate
attribute of np.bitwise_xor
:
# create random switch points with 20% probability
switches = np.random.random(20) < 0.2
values = np.bitwise_xor.accumulate(switches)
# display them side by side
np.c_[switches,values]
# array([[False, False],
# [False, False],
# [ True, True],
# [False, True],
# [False, True],
# [False, True],
# [ True, False],
# [False, False],
# ...
This is almost what we want only it starts on the wrong side. The easiest remedy would be to simply flip the result:
values ^= start_value
Alternatively, for very long sequences it may be more econonimcal to switch only the first element of switches
switches[0] ^= start_value
values = np.bitwise_xor.accumulate(switches)
# restore switches to original state
switches[0] ^= start_value
Upvotes: 1
Reputation: 150735
IIUC,
s = df['condition'].eq('Switch')
df['result'] = np.where(s.cumsum() % 2 == 0,
'T', 'F')
Note that s
here is exactly your condition:
df['A'] > np.random.uniform(0.0, 0.5)
So you can bypass the construction of df['condition']
like this:
s = df['A'] > np.random.uniform(0.0, 0.5)
df['result'] = np.where(s.cumsum() % 2 == 0, 'T', 'F')
Output:
A condition result
0 0.004392 DoNotSwitch T
1 0.002493 DoNotSwitch T
2 0.007735 DoNotSwitch T
3 0.000523 Switch F
4 0.009902 DoNotSwitch F
5 0.001712 DoNotSwitch F
6 0.006146 DoNotSwitch F
7 0.005325 Switch T
8 0.003555 DoNotSwitch T
9 0.003444 DoNotSwitch T
10 0.005225 DoNotSwitch T
11 0.000619 DoNotSwitch T
Upvotes: 2