Galaffer
Galaffer

Reputation: 171

Days between this and next time a column value is True?

I am trying to do a date calculation counting days passing between events in a non-date column in pandas.

I have a pandas dataframe that looks something like this:

df = pd.DataFrame({'date':[
    '01.01.2020','02.01.2020','03.01.2020','10.01.2020', 
    '01.01.2020','04.02.2020','20.02.2020','21.02.2020', 
    '01.02.2020','10.02.2020','20.02.2020','20.03.2020'],
    'user_id':[1,1,1,1,2,2,2,2,3,3,3,3],
    'other_val':[0,0,0,100,0,100,0,10,10,0,0,10],
    'booly':[True, False, False, True,
            True, False, False, True, 
            True, True, True, True]})

Now, I've been unable to figure out how to create a new column stating the number of days that passed between each True value in the 'booly' column, for each user. So for each row with a True in the 'booly' column, how many days is it until the next row with a True in the 'booly' column occurs, like so:

    date    user_id  booly   days_until_next_booly
01.01.2020  1        True    9   
02.01.2020  1        False   None
03.01.2020  1        False   None
10.01.2020  1        True    None   
01.01.2020  2        True    51
04.02.2020  2        False   None
20.02.2020  2        False   None
21.01.2020  2        True    None
01.02.2020  3        True    9
10.02.2020  3        True    10
20.02.2020  3        True    29
20.03.2020  3        True    None

Upvotes: 1

Views: 87

Answers (2)

Rik Kraan
Rik Kraan

Reputation: 586

(
    df
    # fist convert the date column to datetime format
    .assign(date=lambda x: pd.to_datetime(x['date'], dayfirst=True))
    # sort your dates
    .sort_values('date')
    # calculate the difference between subsequent dates
    .assign(date_diff=lambda x: x['date'].diff(1).shift(-1))
    # Groupby your booly column to calculate the cumulative days between True values
    .assign(date_diff_cum=lambda x: x.groupby(x['booly'].cumsum())['date_diff'].transform('sum').where(x['booly'] == True))
)

Output:
date    user_id     other_val   booly   date_diff   date_diff_cum
2020-01-01  2       0           True    1 days      9 days
2020-01-02  1       0           False   1 days      NaT
2020-01-03  1       0           False   7 days      NaT
2020-01-10  1       100         True    22 days     22 days
2020-02-01  1       0           True    0 days      0 days
2020-02-01  3       10          True    3 days      9 days
2020-02-04  2       10          False   6 days      NaT
2020-02-10  3       0           True    10 days     10 days
2020-02-20  2       100         False   0 days      NaT
2020-02-20  3       0           True    1 days      1 days
2020-02-21  2       0           True    28 days     28 days
2020-03-20  3       10          True    NaT         0 days

Upvotes: 1

It_is_Chris
It_is_Chris

Reputation: 14113

# sample data 
df = pd.DataFrame({'date':[
    '01.01.2020','02.01.2020','03.01.2020','10.01.2020', 
    '01.01.2020','04.02.2020','20.02.2020','21.02.2020', 
    '01.02.2020','10.02.2020','20.02.2020','20.03.2020'],
    'user_id':[1,1,1,1,2,2,2,2,3,3,3,3],
    'other_val':[0,0,0,100,0,100,0,10,10,0,0,10],
    'booly':[True, False, False, True,
            True, False, False, True, 
            True, True, True, True]})

# convert data to date time format
df['date'] = pd.to_datetime(df['date'], dayfirst=True)

# use loc with groupby to calculate the difference between True values
df.loc[df['booly'] == True, 'days_until_next_booly'] = df.loc[df['booly'] == True].groupby('user_id')['date'].diff().shift(-1)


         date  user_id  other_val  booly days_until_next_booly
0  2020-01-01        1          0   True                9 days
1  2020-01-02        1          0  False                   NaT
2  2020-01-03        1          0  False                   NaT
3  2020-01-10        1        100   True                   NaT
4  2020-01-01        2          0   True               51 days
5  2020-02-04        2        100  False                   NaT
6  2020-02-20        2          0  False                   NaT
7  2020-02-21        2         10   True                   NaT
8  2020-02-01        3         10   True                9 days
9  2020-02-10        3          0   True               10 days
10 2020-02-20        3          0   True               29 days
11 2020-03-20        3         10   True                   NaT

Upvotes: 1

Related Questions