Reputation: 147
I'm looking to find a way to determine if a time in a column falls within 7 days of another date in that same column.
Say this is my data frame-
dic = {'firstname':['Rick','Rick','Rick','John','John','John','David',
'David','David','Steve','Steve','Steve','Jim','Jim',
'Jim'],
'lastname':['Smith','Smith','Smith','Jones','Jones','Jones',
'Wilson','Wilson','Wilson','Johnson','Johnson',
'Johnson','Miller','Miller','Miller'],
'company':['CFA','CFA','CFA','WND','WND','WND','INO','INO','INO',
'CHP','CHP','CHP','MCD','MCD','MCD'],
'faveday':['2020-03-16','2020-03-11','2020-03-25','2020-04-30',
'2020-05-22','2020-05-03','2020-01-31','2020-01-13',
'2020-01-10','2020-10-22','2020-10-28','2020-10-22',
'2020-10-13','2020-10-28','2020-10-20']}
df = pd.DataFrame(dic)
df['faveday'] = pd.to_datetime(df['faveday'])
print(df)
With output-
firstname lastname company faveday
0 Rick Smith CFA 2020-03-16
1 Rick Smith CFA 2020-03-11
2 Rick Smith CFA 2020-03-25
3 John Jones WND 2020-04-30
4 John Jones WND 2020-05-22
5 John Jones WND 2020-05-03
6 David Wilson INO 2020-01-31
7 David Wilson INO 2020-01-13
8 David Wilson INO 2020-01-10
9 Steve Johnson CHP 2020-10-22
10 Steve Johnson CHP 2020-10-28
11 Steve Johnson CHP 2020-10-22
12 Jim Miller MCD 2020-10-13
13 Jim Miller MCD 2020-10-28
14 Jim Miller MCD 2020-10-20
And then I sort the data with-
df = df.sort_values(['firstname','lastname','company','faveday'])
print(df)
to get-
firstname lastname company faveday
8 David Wilson INO 2020-01-10
7 David Wilson INO 2020-01-13
6 David Wilson INO 2020-01-31
12 Jim Miller MCD 2020-10-13
14 Jim Miller MCD 2020-10-20
13 Jim Miller MCD 2020-10-28
3 John Jones WND 2020-04-30
5 John Jones WND 2020-05-03
4 John Jones WND 2020-05-22
1 Rick Smith CFA 2020-03-11
0 Rick Smith CFA 2020-03-16
2 Rick Smith CFA 2020-03-25
9 Steve Johnson CHP 2020-10-22
11 Steve Johnson CHP 2020-10-22
10 Steve Johnson CHP 2020-10-28
Say I want to know in this current order (index 8 then 7, 6, 12, etc.) whether or not a date is within 7 days of another. (So index 8 and 7 would both yield true but index 6 would not)
But I also would like to keep it grouped by name. (So index 12 and 14 would be true and 13 would not in the Jim Miller group, but index 9, 11, and 10 would all be true in the Steve Johnson group)
Is there a way to subtract dates within each group and then create a column to say TRUE or FALSE dependent upon it being within 7 days of another day?
I'm looking for an output like this-
firstname lastname company faveday seven_days
8 David Wilson INO 2020-01-10 TRUE
7 David Wilson INO 2020-01-13 TRUE
6 David Wilson INO 2020-01-31 FALSE
12 Jim Miller MCD 2020-10-13 TRUE
14 Jim Miller MCD 2020-10-20 TRUE
13 Jim Miller MCD 2020-10-28 FALSE
3 John Jones WND 2020-04-30 TRUE
5 John Jones WND 2020-05-03 TRUE
4 John Jones WND 2020-05-22 FALSE
1 Rick Smith CFA 2020-03-11 TRUE
0 Rick Smith CFA 2020-03-16 TRUE
2 Rick Smith CFA 2020-03-25 FALSE
9 Steve Johnson CHP 2020-10-22 TRUE
11 Steve Johnson CHP 2020-10-22 TRUE
10 Steve Johnson CHP 2020-10-28 TRUE
Upvotes: 1
Views: 34
Reputation: 20669
You can try this.
from datetime import timedelta
m = (df.groupby(['firstname','lastname']).
apply(lambda x: x['faveday'].sub(x['faveday'].shift()).bfill()).
reset_index(level=[0,1],drop=True))
df['seven_days'] = m.le(timedelta(days=7))
firstname lastname company faveday seven_days
8 David Wilson INO 2020-01-10 True
7 David Wilson INO 2020-01-13 True
6 David Wilson INO 2020-01-31 False
12 Jim Miller MCD 2020-10-13 True
14 Jim Miller MCD 2020-10-20 True
13 Jim Miller MCD 2020-10-28 False
3 John Jones WND 2020-04-30 True
5 John Jones WND 2020-05-03 True
4 John Jones WND 2020-05-22 False
1 Rick Smith CFA 2020-03-11 True
0 Rick Smith CFA 2020-03-16 True
2 Rick Smith CFA 2020-03-25 False
9 Steve Johnson CHP 2020-10-22 True
11 Steve Johnson CHP 2020-10-22 True
10 Steve Johnson CHP 2020-10-28 True
Upvotes: 1
Reputation: 323226
Let us try self define a function with numpy
broadcast
def sefd (x):
return np.sum((np.abs(x.values-x.values[:,None])/np.timedelta64(1, 'D'))<=7,axis=1)>=2
s=df.groupby(['firstname', 'lastname', 'company'])['faveday'].transform(sefd)
Out[301]:
0 True
1 True
2 False
3 True
4 False
5 True
6 False
7 True
8 True
9 True
10 True
11 True
12 True
13 False
14 True
Name: faveday, dtype: bool
df['seven_days']=s
Upvotes: 1