FlyingPickle
FlyingPickle

Reputation: 1133

conditional backward fill a column in python

I have a data set (sample) like below

Date             Value
2019-05-01        0
2019-05-02        0
2019-05-03        0
2019-05-04        0
2019-05-05        0
2019-05-06        0
2019-05-07        0
2019-05-08        1
2019-05-09        0

I want to transform it such that, if I encounter Value=1, then I take the 3 values from 2 days before and fill it as 1. Also set the current value to be 0. In other words, the transformed data set should look like this

Date             Value
2019-05-01        0
2019-05-02        0
2019-05-03        1
2019-05-04        1
2019-05-05        1
2019-05-06        0
2019-05-07        0
2019-05-08        0
2019-05-09        0

Do notice, that in the example above, 2019-05-08 was set to 0 after transformation, and 2019-05-03 to 2019-05-05 was set to 1 (last value set to 1 is 2 days before 2019-05-08 and 3 days preceding 2019-05-05 is also set to 1). If two consecutive values show up as 1, we start the date calculation from the last value that shows up as 1. I think I can do this via for loops, but was looking to see if any inbuilt functions can help me with this. Thanks!

Upvotes: 4

Views: 192

Answers (3)

user2314737
user2314737

Reputation: 29327

No sure about the efficiency of this solution because one needs to create three new columns but this also works:

df['shiftedValues'] = \
           df['Value'].shift(-3, fill_value=0) + \
           df['Value'].shift(-4, fill_value=0) + \
           df['Value'].shift(-5, fill_value=0)

Note that the shift is done by row and not by day.

To shift by actual days I would first index by dates

df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')

df['shiftedValues'] = \
df['Value'].shift(-3, freq='1D', fill_value=0).asof(df.index) + \
df['Value'].shift(-4, freq='1D', fill_value=0).asof(df.index) + \
df['Value'].shift(-5, freq='1D', fill_value=0).asof(df.index)
# Out:
#             Value  shiftedValues
# Date                            
# 2019-05-01      0            0.0
# 2019-05-02      0            0.0
# 2019-05-03      0            1.0
# 2019-05-04      0            1.0
# 2019-05-05      0            1.0
# 2019-05-06      0            0.0
# 2019-05-07      0            0.0
# 2019-05-08      1            0.0
# 2019-05-09      0            0.0

Now this works correctly for dates, for instance if df is (note the missing and repeated days)

         Date  Value
0  2019-05-01      0
1  2019-05-02      0
2  2019-05-03      0
3  2019-05-04      0
4  2019-05-05      0
5  2019-05-05      0
6  2019-05-07      0
7  2019-05-08      1
8  2019-05-09      0

then you get

            Value  shiftedValues
Date                            
2019-05-01      0            0.0
2019-05-02      0            0.0
2019-05-03      0            1.0
2019-05-04      0            1.0
2019-05-05      0            1.0
2019-05-05      0            1.0
2019-05-07      0            0.0
2019-05-08      1            0.0
2019-05-09      0            0.0

Upvotes: 0

Valentino
Valentino

Reputation: 7361

A one line solution, assuming that df is your original dataframe:

df['Value'] = pd.Series([1 if 1 in df.iloc[i+3:i+6].values else 0 for i in df.index])

Here I work on index rather than dates, so I assume that you have one day per row and days are consecutive as shown in your example.

To fit also for this request:

If two consecutive values show up as 1, we start the date calculation from the last value that shows up as 1.

I can propose a two line solution:

validones = [True if df.iloc[i]['Value'] == 1 and df.iloc[i+1]['Value'] == 0 else False for i in df.index]
df['Value'] = pd.Series([1 if any(validones[i+3:i+6]) else 0 for i in range(len(validones))])

Basically first I build a list of boolean to check if the 1 in df['Value'] is not followed by another 1 and use this boolean list to perform the substitutions.

Upvotes: 1

amanb
amanb

Reputation: 5463

There could be more precise ways of solving this problem. However, I could only think of solving this using the index values(say i) where Value==1 and then grab the index values at preceding locations(2 dates before means i-3 and then two more values above it means i-4, i-5) and assign the Value to 1. Finally, set the Value back to 0 for the index location(s) that were originally found for Value==1.

In [53]: df = pd.DataFrame({'Date':['2019-05-01','2019-05-02', '2019-05-03','2019-05-04','2019-05-05', '2019-05-06','20
    ...: 19-05-07','2019-05-08','2019-05-09'], 'Value':[0,0,0,0,0,0,0,1,0]})
    ...:
    ...:

In [54]: val_1_index = df.loc[df.Value == 1].index.tolist()

In [55]: val_1_index_decr = [(i-3, i-4, i-5) for i in val_1_index]

In [56]: df.loc[df['Value'].index.isin([i for i in val_1_index_decr[0]]), 'Value'] = 1

In [57]: df.loc[df['Value'].index.isin(val_1_index), 'Value'] = 0

In [58]: df
Out[58]:
         Date  Value
0  2019-05-01      0
1  2019-05-02      0
2  2019-05-03      1
3  2019-05-04      1
4  2019-05-05      1
5  2019-05-06      0
6  2019-05-07      0
7  2019-05-08      0
8  2019-05-09      0

Upvotes: 1

Related Questions