Siddharth
Siddharth

Reputation: 373

Switch consecutive values in column until condition changes

I'm trying to compare two values (this can be two columns or even two random values), the result of said comparison will be used to fill values in a pandas column. The comparison itself is not a problem, and can be done using np.where. But my goal is to have a conditional switch. For example, if the comparision results in true, than all the values until comparision again becomes true will be X. In case the condition becomes true again, the values to be filled will switch to Y, followed by X in case of another true....until all comparisons are done.
Similar to this: Repeat the value in column until a change occurs

Minimum working example to create dataframe:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.uniform(0, .01, size=(100, 1)),
                  columns=list('A'))
df['condition'] = np.where(df['A'] > np.random.uniform(0.0, 0.5),
                           'Switch', 'DoNotSwitch')

print df

           A    condition
0   0.004392  DoNotSwitch
1   0.002493  DoNotSwitch
2   0.007735  DoNotSwitch
3   0.000523  Switch
4   0.009902  DoNotSwitch
5   0.001712  DoNotSwitch
6   0.006146  DoNotSwitch
7   0.005325  Switch
8   0.003555  DoNotSwitch
9   0.003444  DoNotSwitch
10  0.005225  DoNotSwitch
11  0.000619  DoNotSwitch

What I want is:

           A    condition    Result
0   0.004392  DoNotSwitch    T
1   0.002493  DoNotSwitch    T
2   0.007735  DoNotSwitch    T
3   0.000523  Switch         F
4   0.009902  DoNotSwitch    F
5   0.001712  DoNotSwitch    F
6   0.006146  DoNotSwitch    F
7   0.005325  Switch         T
8   0.003555  DoNotSwitch    T
9   0.003444  DoNotSwitch    T
10  0.005225  DoNotSwitch    T
11  0.000619  DoNotSwitch    T

Is there a way to condense np.where and the generation of Result column in one single step, since the condition column just indicates where the switch should happen and itself is not necessary ?

Upvotes: 1

Views: 266

Answers (2)

Paul Panzer
Paul Panzer

Reputation: 53029

Switching a boolean old_value conditional on another boolean switch can be written using XOR:

new_value = old_valaue ^ switch

To do it sequentially we can use the accumulate attribute of np.bitwise_xor:

# create random switch points with 20% probability
switches = np.random.random(20) < 0.2
values = np.bitwise_xor.accumulate(switches)

# display them side by side
np.c_[switches,values]
# array([[False, False],
#        [False, False],
#        [ True,  True],
#        [False,  True],
#        [False,  True],
#        [False,  True],
#        [ True, False],
#        [False, False],
#   ...

This is almost what we want only it starts on the wrong side. The easiest remedy would be to simply flip the result:

values ^= start_value

Alternatively, for very long sequences it may be more econonimcal to switch only the first element of switches

switches[0] ^= start_value
values = np.bitwise_xor.accumulate(switches)
# restore switches to original state
switches[0] ^= start_value

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

IIUC,

s = df['condition'].eq('Switch')

df['result'] = np.where(s.cumsum() % 2 == 0,
                        'T', 'F')

Note that s here is exactly your condition:

df['A'] > np.random.uniform(0.0, 0.5)

So you can bypass the construction of df['condition'] like this:

s = df['A'] > np.random.uniform(0.0, 0.5)
df['result'] = np.where(s.cumsum() % 2 == 0, 'T', 'F')

Output:

           A    condition result
0   0.004392  DoNotSwitch      T
1   0.002493  DoNotSwitch      T
2   0.007735  DoNotSwitch      T
3   0.000523       Switch      F
4   0.009902  DoNotSwitch      F
5   0.001712  DoNotSwitch      F
6   0.006146  DoNotSwitch      F
7   0.005325       Switch      T
8   0.003555  DoNotSwitch      T
9   0.003444  DoNotSwitch      T
10  0.005225  DoNotSwitch      T
11  0.000619  DoNotSwitch      T

Upvotes: 2

Related Questions