Reputation: 7448
I have the following df
,
cluster_id date
1 2018-01-02
1 2018-02-01
1 2018-03-30
2 2018-04-01
2 2018-04-23
2 2018-05-18
3 2018-06-01
3 2018-07-30
3 2018-09-30
I like to create a boolean column recur_pmt
, which is set to True
if all differences between consecutive values of date
in each cluster (df.groupby('cluster_id')
) are 30 < x < 40
; and False
otherwise. So the result is like,
cluster_id date recur_pmt
1 2018-01-02 False
1 2018-02-01 False
1 2018-03-30 False
2 2018-04-01 True
2 2018-04-23 True
2 2018-05-18 True
3 2018-06-01 False
3 2018-07-30 False
3 2018-09-30 False
I tried
df['recur_pmt'] = df.groupby('cluster_id')['date'].apply(
lambda x: (20 < x.diff().dropna().dt.days < 40).all())
but it did not work. I am also wondering can it use transform
as well in this case.
Upvotes: 1
Views: 29
Reputation: 862591
Use transform
with Series.between
and parameter inclusive=False
:
df['recur_pmt'] = df.groupby('cluster_id')['date'].transform(
lambda x: (x.diff().dropna().dt.days.between(20, 40, inclusive=False)).all())
print (df)
cluster_id date recur_pmt
0 1 2018-01-02 False
1 1 2018-02-01 False
2 1 2018-03-30 False
3 2 2018-04-01 True
4 2 2018-04-23 True
5 2 2018-05-18 True
6 3 2018-06-01 False
7 3 2018-07-30 False
8 3 2018-09-30 False
Upvotes: 1