user2906657
user2906657

Reputation: 541

Dummy variable manipulation based on date range

I have a input dataset as follows:

date      event         A

2015-07-15  A     1
2015-07-16  A     1
.......
2015-07-22  A     1
2015-07-23  A     1   
........
2015-07-26  A     1

Here A is the dummy/indicator variable for event A. I want to further subset the dummy variable, where your A_new(new indicator variable) is 1 if date is in the range 2015-07-15 and 2015-07-22, else it remains 0.

Output is:
date       event        A  A_new

2015-07-15  A           1   1
2015-07-16  A           1   1
.......
2015-07-22  A           1   1
2015-07-23  A           1   0   
........
2015-07-26  A           1   0

My code looks like:

df['A_new']=df['A']

df.loc[~((df.date >='2015-07-15') & (df.date <='2015-07-22')),'A_new']=0  

But still I am not get desired result. Just wonder my logic is correct.

Upvotes: 2

Views: 1527

Answers (1)

jezrael
jezrael

Reputation: 863411

Simplier solution is cast boolean mask to int - so True are 1 and False are 0:

df['A_new'] = ((df.date >='2015-07-15') & (df.date <='2015-07-22')).astype(int)
print (df)
        date event  A  A_new
0 2015-07-15     A  1      1
1 2015-07-16     A  1      1
2 2015-07-22     A  1      1
3 2015-07-23     A  1      0
4 2015-07-26     A  1      0

Another solution with between:

df['A_new'] = df.date.between('2015-07-15','2015-07-22').astype(int)
print (df)
        date event  A  A_new
0 2015-07-15     A  1      1
1 2015-07-16     A  1      1
2 2015-07-22     A  1      1
3 2015-07-23     A  1      0
4 2015-07-26     A  1      0

As piRSquared pointed if need values by column A multiple by A (thank you):

print (df)
         date event  A
0  2015-07-15     A  1
1  2015-07-16     A  6
2  2015-07-22     A  2
3  2015-07-23     A  1
4  2015-07-26     A  1

df['A_new'] = df.date.between('2015-07-15','2015-07-22').astype(int).mul(df.A)
print (df)
         date event  A  A_new
0  2015-07-15     A  1      1
1  2015-07-16     A  6      6
2  2015-07-22     A  2      2
3  2015-07-23     A  1      0
4  2015-07-26     A  1      0

Solution with where:

df['A_new'] = df.A.where(df.date.between('2015-07-15','2015-07-22'), 0)
print (df)
         date event  A  A_new
0  2015-07-15     A  1      1
1  2015-07-16     A  6      6
2  2015-07-22     A  2      2
3  2015-07-23     A  1      0
4  2015-07-26     A  1      0

Upvotes: 3

Related Questions