Reputation: 1133
I have a data set (sample) like below
Date Value
2019-05-01 0
2019-05-02 0
2019-05-03 0
2019-05-04 0
2019-05-05 0
2019-05-06 0
2019-05-07 0
2019-05-08 1
2019-05-09 0
I want to transform it such that, if I encounter Value=1, then I take the 3 values from 2 days before and fill it as 1. Also set the current value to be 0. In other words, the transformed data set should look like this
Date Value
2019-05-01 0
2019-05-02 0
2019-05-03 1
2019-05-04 1
2019-05-05 1
2019-05-06 0
2019-05-07 0
2019-05-08 0
2019-05-09 0
Do notice, that in the example above, 2019-05-08 was set to 0 after transformation, and 2019-05-03 to 2019-05-05 was set to 1 (last value set to 1 is 2 days before 2019-05-08 and 3 days preceding 2019-05-05 is also set to 1). If two consecutive values show up as 1, we start the date calculation from the last value that shows up as 1. I think I can do this via for loops, but was looking to see if any inbuilt functions can help me with this. Thanks!
Upvotes: 4
Views: 192
Reputation: 29327
No sure about the efficiency of this solution because one needs to create three new columns but this also works:
df['shiftedValues'] = \
df['Value'].shift(-3, fill_value=0) + \
df['Value'].shift(-4, fill_value=0) + \
df['Value'].shift(-5, fill_value=0)
Note that the shift is done by row and not by day.
To shift by actual days I would first index by dates
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
df['shiftedValues'] = \
df['Value'].shift(-3, freq='1D', fill_value=0).asof(df.index) + \
df['Value'].shift(-4, freq='1D', fill_value=0).asof(df.index) + \
df['Value'].shift(-5, freq='1D', fill_value=0).asof(df.index)
# Out:
# Value shiftedValues
# Date
# 2019-05-01 0 0.0
# 2019-05-02 0 0.0
# 2019-05-03 0 1.0
# 2019-05-04 0 1.0
# 2019-05-05 0 1.0
# 2019-05-06 0 0.0
# 2019-05-07 0 0.0
# 2019-05-08 1 0.0
# 2019-05-09 0 0.0
Now this works correctly for dates, for instance if df is (note the missing and repeated days)
Date Value
0 2019-05-01 0
1 2019-05-02 0
2 2019-05-03 0
3 2019-05-04 0
4 2019-05-05 0
5 2019-05-05 0
6 2019-05-07 0
7 2019-05-08 1
8 2019-05-09 0
then you get
Value shiftedValues
Date
2019-05-01 0 0.0
2019-05-02 0 0.0
2019-05-03 0 1.0
2019-05-04 0 1.0
2019-05-05 0 1.0
2019-05-05 0 1.0
2019-05-07 0 0.0
2019-05-08 1 0.0
2019-05-09 0 0.0
Upvotes: 0
Reputation: 7361
A one line solution, assuming that df
is your original dataframe:
df['Value'] = pd.Series([1 if 1 in df.iloc[i+3:i+6].values else 0 for i in df.index])
Here I work on index rather than dates, so I assume that you have one day per row and days are consecutive as shown in your example.
To fit also for this request:
If two consecutive values show up as 1, we start the date calculation from the last value that shows up as 1.
I can propose a two line solution:
validones = [True if df.iloc[i]['Value'] == 1 and df.iloc[i+1]['Value'] == 0 else False for i in df.index]
df['Value'] = pd.Series([1 if any(validones[i+3:i+6]) else 0 for i in range(len(validones))])
Basically first I build a list of boolean to check if the 1 in df['Value']
is not followed by another 1 and use this boolean list to perform the substitutions.
Upvotes: 1
Reputation: 5463
There could be more precise ways of solving this problem. However, I could only think of solving this using the index values(say i) where Value==1
and then grab the index values at preceding locations(2 dates before means i-3 and then two more values above it means i-4, i-5) and assign the Value
to 1. Finally, set the Value
back to 0 for the index location(s) that were originally found for Value==1
.
In [53]: df = pd.DataFrame({'Date':['2019-05-01','2019-05-02', '2019-05-03','2019-05-04','2019-05-05', '2019-05-06','20
...: 19-05-07','2019-05-08','2019-05-09'], 'Value':[0,0,0,0,0,0,0,1,0]})
...:
...:
In [54]: val_1_index = df.loc[df.Value == 1].index.tolist()
In [55]: val_1_index_decr = [(i-3, i-4, i-5) for i in val_1_index]
In [56]: df.loc[df['Value'].index.isin([i for i in val_1_index_decr[0]]), 'Value'] = 1
In [57]: df.loc[df['Value'].index.isin(val_1_index), 'Value'] = 0
In [58]: df
Out[58]:
Date Value
0 2019-05-01 0
1 2019-05-02 0
2 2019-05-03 1
3 2019-05-04 1
4 2019-05-05 1
5 2019-05-06 0
6 2019-05-07 0
7 2019-05-08 0
8 2019-05-09 0
Upvotes: 1