Reputation: 613
I have this dataframe
date qq q_t l_t
1956-01-01 1 4 True
1956-01-02 2 5 True
1956-01-03 3 1 False
1956-01-04 4 1 False
1956-01-05 5 1 False
1956-01-06 6 10 True
1956-01-07 7 11 True
1956-01-08 8 12 True
1956-01-09 9 5 False
1956-01-10 10 3 False
1956-01-11 11 3 False
1956-01-12 12 3 False
1956-01-13 13 50 True
1956-01-14 14 51 True
1956-01-15 15 52 True
1956-01-16 16 53 True
1956-01-17 17 1 False
1956-01-18 18 23 True
1956-01-19 19 1 False
I would like to find the number of successive days with true values and store the initial and final date. In other words, I would like to create a new dataframe as:
index days start end
1 2 1956-01-01 1956-01-02
2 3 1956-01-06 1956-01-08
3 4 1956-01-50 1956-01-53
4 1 1956-01-18 1956-01-18
I thought to work with np.where and then do a cycle all over the dataframe, a sort of
_fref = np.where(dfr_['l_t'] == 'True')
for i in range(0, len(_fref)-1):
i_1 = _fref[i]
i_2 = _fref[i+1]
deltat = i_2-i_1
if deltat == 1:
...
...
It seems not very elegant and I am pretty sure that there are different way to do that. What do you think? Is it better to stay stick with the cycle strategy?
Upvotes: 1
Views: 53
Reputation: 817
Here's my try
df['date'] = pd.to_datetime(df['date'])
df['group'] = (df['l_t'] != df['l_t'].shift()).cumsum()
true_values = df[df['l_t']]
# Calculate the duration (number of successive days) for each group of True values
consecutive_days = true_values.groupby('group')['date'].agg(['min', 'max', 'count'])
consecutive_days = consecutive_days.reset_index()
print(consecutive_days)
Hope this works for you
Upvotes: 3