Delto
Delto

Reputation: 147

Comparing and Subtracting Dates

I'm looking to find a way to determine if a time in a column falls within 7 days of another date in that same column.

Say this is my data frame-

dic = {'firstname':['Rick','Rick','Rick','John','John','John','David',
                    'David','David','Steve','Steve','Steve','Jim','Jim',
                    'Jim'],
       'lastname':['Smith','Smith','Smith','Jones','Jones','Jones',
                   'Wilson','Wilson','Wilson','Johnson','Johnson',
                   'Johnson','Miller','Miller','Miller'],
       'company':['CFA','CFA','CFA','WND','WND','WND','INO','INO','INO',
                  'CHP','CHP','CHP','MCD','MCD','MCD'],
       'faveday':['2020-03-16','2020-03-11','2020-03-25','2020-04-30',
                  '2020-05-22','2020-05-03','2020-01-31','2020-01-13',
                  '2020-01-10','2020-10-22','2020-10-28','2020-10-22',
                  '2020-10-13','2020-10-28','2020-10-20']}
df = pd.DataFrame(dic)
df['faveday'] = pd.to_datetime(df['faveday'])
print(df)

With output-

   firstname lastname company    faveday
0       Rick    Smith     CFA 2020-03-16
1       Rick    Smith     CFA 2020-03-11
2       Rick    Smith     CFA 2020-03-25
3       John    Jones     WND 2020-04-30
4       John    Jones     WND 2020-05-22
5       John    Jones     WND 2020-05-03
6      David   Wilson     INO 2020-01-31
7      David   Wilson     INO 2020-01-13
8      David   Wilson     INO 2020-01-10
9      Steve  Johnson     CHP 2020-10-22
10     Steve  Johnson     CHP 2020-10-28
11     Steve  Johnson     CHP 2020-10-22
12       Jim   Miller     MCD 2020-10-13
13       Jim   Miller     MCD 2020-10-28
14       Jim   Miller     MCD 2020-10-20

And then I sort the data with-

df = df.sort_values(['firstname','lastname','company','faveday'])
print(df)

to get-

   firstname lastname company    faveday
8      David   Wilson     INO 2020-01-10
7      David   Wilson     INO 2020-01-13
6      David   Wilson     INO 2020-01-31
12       Jim   Miller     MCD 2020-10-13
14       Jim   Miller     MCD 2020-10-20
13       Jim   Miller     MCD 2020-10-28
3       John    Jones     WND 2020-04-30
5       John    Jones     WND 2020-05-03
4       John    Jones     WND 2020-05-22
1       Rick    Smith     CFA 2020-03-11
0       Rick    Smith     CFA 2020-03-16
2       Rick    Smith     CFA 2020-03-25
9      Steve  Johnson     CHP 2020-10-22
11     Steve  Johnson     CHP 2020-10-22
10     Steve  Johnson     CHP 2020-10-28

Say I want to know in this current order (index 8 then 7, 6, 12, etc.) whether or not a date is within 7 days of another. (So index 8 and 7 would both yield true but index 6 would not)

But I also would like to keep it grouped by name. (So index 12 and 14 would be true and 13 would not in the Jim Miller group, but index 9, 11, and 10 would all be true in the Steve Johnson group)

Is there a way to subtract dates within each group and then create a column to say TRUE or FALSE dependent upon it being within 7 days of another day?

I'm looking for an output like this-

   firstname lastname company    faveday seven_days
8      David   Wilson     INO 2020-01-10       TRUE
7      David   Wilson     INO 2020-01-13       TRUE
6      David   Wilson     INO 2020-01-31      FALSE
12       Jim   Miller     MCD 2020-10-13       TRUE
14       Jim   Miller     MCD 2020-10-20       TRUE
13       Jim   Miller     MCD 2020-10-28      FALSE
3       John    Jones     WND 2020-04-30       TRUE
5       John    Jones     WND 2020-05-03       TRUE
4       John    Jones     WND 2020-05-22      FALSE
1       Rick    Smith     CFA 2020-03-11       TRUE
0       Rick    Smith     CFA 2020-03-16       TRUE
2       Rick    Smith     CFA 2020-03-25      FALSE
9      Steve  Johnson     CHP 2020-10-22       TRUE
11     Steve  Johnson     CHP 2020-10-22       TRUE
10     Steve  Johnson     CHP 2020-10-28       TRUE

Upvotes: 1

Views: 34

Answers (2)

Ch3steR
Ch3steR

Reputation: 20669

You can try this.

from datetime import timedelta

m = (df.groupby(['firstname','lastname']).
        apply(lambda x: x['faveday'].sub(x['faveday'].shift()).bfill()).
        reset_index(level=[0,1],drop=True))
df['seven_days'] = m.le(timedelta(days=7))

   firstname lastname company    faveday  seven_days
8      David   Wilson     INO 2020-01-10        True
7      David   Wilson     INO 2020-01-13        True
6      David   Wilson     INO 2020-01-31       False
12       Jim   Miller     MCD 2020-10-13        True
14       Jim   Miller     MCD 2020-10-20        True
13       Jim   Miller     MCD 2020-10-28       False
3       John    Jones     WND 2020-04-30        True
5       John    Jones     WND 2020-05-03        True
4       John    Jones     WND 2020-05-22       False
1       Rick    Smith     CFA 2020-03-11        True
0       Rick    Smith     CFA 2020-03-16        True
2       Rick    Smith     CFA 2020-03-25       False
9      Steve  Johnson     CHP 2020-10-22        True
11     Steve  Johnson     CHP 2020-10-22        True
10     Steve  Johnson     CHP 2020-10-28        True

Upvotes: 1

BENY
BENY

Reputation: 323226

Let us try self define a function with numpy broadcast

def sefd (x): 
    return np.sum((np.abs(x.values-x.values[:,None])/np.timedelta64(1, 'D'))<=7,axis=1)>=2
s=df.groupby(['firstname', 'lastname', 'company'])['faveday'].transform(sefd)
Out[301]: 
0      True
1      True
2     False
3      True
4     False
5      True
6     False
7      True
8      True
9      True
10     True
11     True
12     True
13    False
14     True
Name: faveday, dtype: bool
df['seven_days']=s

Upvotes: 1

Related Questions